03 / embeddings
Give each letter a meaning
An ID like 7 isn't useful by itself. The number 7 is one integer apart from 6 and 8 — but the character at position 7 in our vocab has nothing in common with positions 6 and 8.
So we hand each letter 128 numbers— a vector. They start out completely random; training nudges them until each letter's 128 numbers somehow encode what that letter "means." The full table is 65 × 128 = 8,320 learned numbers.
We don't pick what the dimensions mean. The kid invents them. Slot 0 means whatever is most useful for predicting the next character; same for slot 1, slot 2, all 128.
self.tok_emb = nn.Embedding(VOCAB, N_EMBD) # what each char meansThat one line creates the table. The forward pass is just an indexing operation: hand it a character ID and it returns that row of 128 numbers.
See the actual numbers
Pick two letters. Each one's 128-dim vector is shown as 128 vertical bars — taller bar = stronger signal on that dimension, up = positive, down = negative. The same dimension means the same thing for every letter. That's why you can compare them.
kid.pt. We don't know what each dimension means in human terms — it might be "is this a consonant" or "does this start a proper noun" or something we can't name. But whatever the kid invented, it was useful enough to bring the loss from 4.33 down to 1.48.