I Spent a Week Studying Karpathy's Zero to Hero

If you want to do AI-related programming work, whether that means large language models or more traditional analytical AI like classification and regression, one of the best places to start is Andrej Karpathy’s YouTube series, Neural Networks: Zero to Hero.

Each video is about two hours long. But in my experience, that is misleading. To really follow along, understand the ideas, and reproduce the exercises yourself, each video takes closer to ten hours.

That is also the reason I did not post any updates this past week. I spent the whole week working through the first four videos.

So far, they cover:

the basic principles of neural networks and backpropagation
building the simplest language model with bigrams in makemore
building makemore again with an MLP
activation functions, gradients, and BatchNorm

Next week I plan to keep going and continue with the part where Karpathy builds GPT from scratch.

I had known about this lecture series for a long time, but only recently decided to seriously sit down and study it. Once I started, I was completely absorbed. Karpathy is genuinely an exceptional teacher. He explains difficult ideas in a way that is both precise and approachable.

I had already seen many of these neural network and machine learning concepts before, in books, articles, and scattered tutorials. But I can honestly say that none of those gave me the same level of understanding as following Karpathy’s lectures step by step and implementing the ideas myself.

A Few Suggestions If You Want to Follow Along

If you are also thinking about learning from this series, here are a few practical suggestions.

1. Use Miniconda to manage your Python environment

This is not specific to Karpathy’s course, but it makes setup much easier. A clean environment helps avoid a lot of unnecessary trouble.

2. Write the exercises in Jupyter Notebook

Karpathy writes the practice code in Jupyter Notebook, which is a good fit for this kind of learning. You can code in the browser exactly as he does in the videos.

If you are lazy like me, another workable option is to install the Jupyter extension in VS Code and write notebooks there.

I also tested a few AI coding assistants for notebook work. So far, I have found that only GitHub Copilot really supports code completion inside notebooks in a useful way.

The free Copilot tier gives only a limited number of completions. Unlimited completion requires the paid subscription, which is about $10 per month.

If you already subscribe to Gemini or Codex, another option is to write code in a .py file first, generate or refine it there, and then paste it back into the notebook.

3. Use GPU if you have one

The course uses PyTorch. Karpathy’s original code does not rely on GPU, but in practice some of the experiments run fairly slowly on CPU.

If you have access to a GPU, it is worth using it. In a notebook, you can detect the device during initialization and then place tensors on that device from the start.

For example:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
bngain = torch.ones((1, n_hidden), device=device)

You can also refer to this example notebook:

benyue1978/zero-to-hero/build_makemore_mlp2.ipynb

Why This Series Is So Good

What makes Karpathy’s series especially impressive to me comes down to two things.

First, he is simply a very good teacher. He explains things clearly, and when something important comes up, he does not leave it as an abstract concept. He writes real code and demonstrates the idea directly.

Second, he chose a set of classic and important papers, then reconstructs the development of neural networks and major concepts by following those papers step by step. That makes the series much more than just a coding tutorial. It becomes a guided tour through the logic behind the field.

That is why I strongly recommend it.

If you want to work on model training or fine-tuning in the future, I think following Karpathy’s approach is a very good way to build fundamentals. After that, you can use the same method to study newer papers.

If you do not want to go too deep, even just watching the first video is worthwhile. It gives you a fast and surprisingly solid intuition for machine learning, gradients, and gradient descent.

Of course, to follow the series comfortably, it helps to know a bit of calculus and linear algebra.

If English is a barrier, you can also find translated versions of the same series on Bilibili. There are quite a few.

One Small Reflection

There is also a more personal feeling behind this week.

Over the past few years, as AI has become stronger and stronger, I have noticed something happening in myself at the same time. Even though I am still learning and trying to keep up with the pace of AI development, I also feel that my patience, and my willingness to deeply understand something, have both become weaker.

Maybe this is just an energy-saving instinct. If I can delegate something to AI, I naturally do not want to spend the effort learning it myself.

I am not even sure whether that is wrong.

And if I keep following that thought, it leads to another question: when should children start using AI, and how should they be introduced to it?

At least this week reminded me of one thing: learning still brings dopamine. And maybe that is reason enough to keep going.

The ideas in this post are mine; Codex helped me write it.