MLOps – Ali Darbehani

Beyond GPUs: Mastering Ultra-Scale LLM Training – Part 2

Introduction In the first part of this series, we unpacked the big picture of scaling LLM training. The “why” and “what” behind ultra-scale setups, and how different forms of parallelism come together to make training trillion-parameter models even possible. That gave us the map. Now it’s time to get into the weeds of the “how.”…

Alireza Darbehani

August 28, 2025

GenAI, GPU, Large Language Models, MLOps

ai, artificial-intelligence, GPU, llm, machine-learning, technology

Beyond GPUs: Mastering Ultra-Scale LLM Training – Part 1

Introduction Training today’s largest language models demands massive computational resources, often thousands of GPUs humming in perfect harmony, orchestrated to act as one. Until recently, only a few elite research labs could marshal such “symphonies” of compute power. The open-source movement has started to change that by releasing model weights (like Llama or DeepSeek) and…

Alireza Darbehani

August 24, 2025

GenAI, Large Language Models, MLOps, Uncategorized

ai, artificial-intelligence, llm, machine-learning, MLOps

Supercharging Your Inference of Large Language Models with vLLM (part-2)

As discussed in part 1 of this blog post vLLM is a high-throughput distributed system for serving large language models (LLMs) efficiently. It addresses the challenge of memory management in LLM serving systems by introducing PagedAttention, an innovative attention algorithm inspired by virtual memory techniques in operating systems. This approach allows for near-zero waste in…

Alireza Darbehani

August 10, 2024

GenAI, Large Language Models, LLM Inference, MLOps

Distributed Inference, GenAI, large-language-model, llm, llm-serving, Paged Attention

Category: MLOps

Beyond GPUs: Mastering Ultra-Scale LLM Training – Part 2

Beyond GPUs: Mastering Ultra-Scale LLM Training – Part 1

Supercharging Your Inference of Large Language Models with vLLM (part-2)