04 / position
Tell the model where things are
Embeddings give every letter a meaning vector. But we've lost something important: order. The letter "E" five characters into a line should mean something different from an "E" at the very start of one. Right now they get the same 128 numbers.
The fix is simple and slightly unbelievable: make a second embedding table — this one indexed by position instead of by character. Then add the two vectors together. That's it. Each position number 0, 1, 2, … 127 gets its own learned 128-number vector that gets added on top.
Why does adding work? Because the model has 4 layers of attention after this to disentangle the "what" from the "where." Both signals are baked into the same vector, and training figures out how to use them.
self.tok_emb = nn.Embedding(VOCAB, N_EMBD) # what each char means
self.pos_emb = nn.Embedding(BLOCK_SIZE, N_EMBD) # where it is in the sequence
# in forward():
pos = torch.arange(T, device=idx.device)
x = self.tok_emb(idx) + self.pos_emb(pos)See it for one prompt
Below is the prompt "ROMEO: To be". Click any character to see (a) its token embedding, (b) the position embedding for that slot, and (c) the sum that actually enters the first transformer block.