01 / task

Pick a task

Before any code, the most important question: what is the kid going to learn to do? We picked the simplest task that still produces something interesting:

Given some text, predict the next character.

That's it. No grammar rules. No dictionary. Just — here are some letters; what comes next? It turns out that if you get very good at this one task, on enough text, you accidentally learn an enormous amount about language along the way.

Step 1: get some text

We use the same dataset Andrej Karpathy uses in his classic char-rnn tutorial: a 1.1 MB plaintext file containing the complete works of William Shakespeare. About 1,115,394 characters. Here's the first chunk of it:

Loading…

train.pylines 41–46

URL = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
if not os.path.exists("input.txt"):
    print("Downloading tinyshakespeare...")
    urllib.request.urlretrieve(URL, "input.txt")
text = open("input.txt").read()

Step 2: turn it into (input, target) pairs

To train the kid to predict the next character, we need lots of examples of "here's some text → here's what came next." The trick is: we don't need to label anything by hand. The text labels itself. The target is just the input shifted by one character.

train.pylines 60–67

def get_batch(split):
    """Sample BATCH_SIZE random chunks of length BLOCK_SIZE.
    x is the input, y is x shifted by one (the next-char target)."""
    d = train_data if split == "train" else val_data
    ix = torch.randint(len(d) - BLOCK_SIZE - 1, (BATCH_SIZE,))
    x = torch.stack([d[i:i + BLOCK_SIZE] for i in ix])
    y = torch.stack([d[i + 1:i + BLOCK_SIZE + 1] for i in ix])
    return x.to(DEVICE), y.to(DEVICE)

Why character-level? Real LLMs (ChatGPT, Claude) use byte-pair encoding — tokens that average ~4 chars each. We stuck with single characters because the vocab fits in 65 items and you can hold the whole pipeline in your head. Same machinery underneath.

Step 1: get some text

Step 2: turn it into (input, target) pairs

Build a vocabulary — turn 65 characters into 65 numbers