Eric Jang – Building AlphaGo from scratch

Dwarkesh Podcast·May 20, 2026

OVERVIEW

This episode features Eric Jang discussing his experience building a simplified version of AlphaGo from scratch, delving into the technical intricacies of the game of Go and the Monte Carlo Tree Search algorithm enhanced by deep learning. The conversation extends to broader themes in AI research, including efficient training methodologies, scaling laws, and the potential of AI tools to automate scientific discovery, using Go as a case study for tackling complex problems.

KEY TOPICS

The significance of AlphaGo and the game of Go.
Rules and scoring of the game of Go.
Monte Carlo Tree Search (MCTS) algorithm.
Deep neural network architectures (ResNets, Transformers).
Value function and policy network in AlphaGo.
Training process of AlphaGo (supervised learning, self-play, distillation).
Challenges in AI research: exploration vs. exploitation, variance, credit assignment.
Scaling laws and the "bitter lesson" in AI development.
Automated scientific research using AI assistants/LLMs.
Off-policy vs. on-policy reinforcement learning.
Bits per flop and bits per sample for learning efficiency.
The concept of chaos and macroscopic structure in complex systems.
Comparing Go to other game/AI domains (e.g., poker, StarCraft).
Best response training and neural fictitious self-play.
Local vs. global verification in AI research.

MAIN TAKEAWAYS

AlphaGo's breakthrough stemmed from successfully applying deep learning to Go, a game previously considered intractable for AI due to its immense search space. The core idea was to use neural networks to "amortize" a very deep search, allowing the system to make smart moves without exhaustively exploring every possibility.
The Monte Carlo Tree Search (MCTS) algorithm, guided by a value network (predicting win/loss) and a policy network (suggesting good moves), is central to AlphaGo's decision-making. The value network helps prune the search tree by estimating the long-term outcome, while the policy network directs exploration towards promising paths.
Distillation, or using a stronger "teacher" model (often from self-play) to train a simpler "student" model, is a powerful technique to improve efficiency and reduce computational cost in training Go AI. This highlights that "good enough" solutions can be bootstrapped towards stronger performance.
The "bitter lesson" of AI research emphasizes that raw compute and scaling often yield better results than intricate architectural innovations. This is reflected in AlphaGo's success, demonstrating that massive computation applied to fundamental algorithms can overcome complex problems.
Automated research assistants, leveraging LLMs, can significantly accelerate the scientific process by assisting with hypothesis generation, code writing, experiment execution, data analysis, and report generation. This suggests a future where AI actively participates in scientific discovery.
The efficiency of learning can be understood through "bits per flop" and "bits per sample." While increasing computational power (flops) is one axis of scaling, optimizing the information gained per sample (bits per sample) is another crucial aspect, often enhanced by techniques like distillation and reducing variance.
The ability of neural networks to find macroscopic structure in complex, chaotic systems (like Go or weather patterns) allows for surprisingly accurate predictions of high-level outcomes, even if micro-level details remain unpredictable. This transforms seemingly intractable problems into manageable ones.
Effective AI research methodology in the age of powerful models involves designing robust experimental setups, verifying local improvements rigorously, and intelligently structuring "outer loop" and "inner loop" processes. The goal is to maximize learning efficiency and avoid getting stuck in suboptimal local minima.

NOTABLE QUOTES

"AlphaGo and Go AI is one of those things that really got me into the field... it was just profound to see, you know, how smart AI systems could become and the the kind of computational complexity class that they could tackle with deep learning."

"You can kind of lose the battle but win the war... as the board size increases, the complexity of these kind of like micro versus macro dynamics gets gets more interesting."

"A 10-layer neural network pass... is able to amortize and approximate to a very, very high fidelity a nearly intractable search problem."

Summarized with DriftNote — AI-powered podcast summaries

Try it free