This podcast episode with Reiner Pope, CEO of Maddox, delves into the underlying mathematical and architectural principles governing how Large Language Models (LLMs) are trained and served. It explores the interplay between hardware capabilities (memory bandwidth, compute performance, interconnects) and model design choices (batch size, context length, sparsity) to explain the complex trade-offs in latency, cost, and model performance. The discussion aims to demystify why AI models behave as they do and how pricing structures for LLM APIs are determined by these factors.
Summarized with DriftNote — AI-powered podcast summaries
Try it free