LLM Mechanics

Tokens, next-token prediction, temperature, pre/post-training, and why scale matters.

Your phone’s keyboard guesses your next word. An LLM is that same idea - scaled to most of the internet and billions of tiny dials.

The simple idea

Type “I’m running late, I’ll be there in ___” and your keyboard offers “5”, “10”, “a”. It’s predicting what usually comes next. An LLM does this for whole paragraphs, emails, and code.

Four layers, each builds on the last

1. Tokens - what the model actually reads

Models don’t see letters or whole words; they see tokens - chunks of roughly 4 characters (~0.75 of a word). Cost and length limits are counted in tokens.

unbelievable= 1 word, 3 tokens

Visual. A word snaps into coloured token chips.

2. Next-token prediction - how it generates

At each step the model assigns a probability to thousands of possible next tokens, then samples one, then repeats. After “The capital of France is”, the token “Paris” gets a huge probability and “banana” gets almost none.

3. Temperature - the randomness dial

Temperature controls how adventurous the sampling is. Low = pick the most likely token (focused, consistent). High = take more chances (creative, but riskier). Prompt “a tagline for a coffee shop”: low temp → “Great coffee, every day.” High temp → “Sip the sunrise.”

4. How it learned - two phases

Visual. From internet-scale text to a helpful assistant.

Pre-training: read trillions of words, just predicting the next token. No labels needed.
Post-training (fine-tuning + RLHF): coach it to be helpful, honest, and safe.

5. Scale and parameters

Parameters are the model’s adjustable dials, set during training - modern models have billions. More parameters + more data + more compute → more capability. That is the “scaling” story behind the leaps you’ve seen.

Interactive

Be the model

Set the temperature, then sample the next token - just like an LLM.

The weather today is ___

sunny

60%

cloudy

19%

rainy

11%

warm

cold

unpredictable

banana

Temperature0.70

Focused (consistent)Creative (riskier)

Low temperature makes the bars peaky - it almost always picks “sunny.” Crank it up and the bars flatten, so even “banana” gets a turn.

Recap