AI Training
Level 1 · Generative AI Literacy
Lesson 1.2Beginner 12 min

LLM Mechanics

Tokens, next-token prediction, temperature, pre/post-training, and why scale matters.

What you’ll be able to do
  • Explain what a token is and how an LLM turns text into tokens.
  • Describe next-token prediction and why temperature changes answers.
  • Explain pre-training vs post-training (including RLHF) in plain terms.
  • Say what billions of parameters means and why scale matters.

Your phone’s keyboard guesses your next word. An LLM is that same idea - scaled to most of the internet and billions of tiny dials.

The simple idea

Type “I’m running late, I’ll be there in ___” and your keyboard offers “5”, “10”, “a”. It’s predicting what usually comes next. An LLM does this for whole paragraphs, emails, and code.

Four layers, each builds on the last

1. Tokens - what the model actually reads

Models don’t see letters or whole words; they see tokens - chunks of roughly 4 characters (~0.75 of a word). Cost and length limits are counted in tokens.

unbelievable= 1 word, 3 tokens
Visual. A word snaps into coloured token chips.

2. Next-token prediction - how it generates

At each step the model assigns a probability to thousands of possible next tokens, then samples one, then repeats. After “The capital of France is”, the token “Paris” gets a huge probability and “banana” gets almost none.

3. Temperature - the randomness dial

Temperature controls how adventurous the sampling is. Low = pick the most likely token (focused, consistent). High = take more chances (creative, but riskier). Prompt “a tagline for a coffee shop”: low temp → “Great coffee, every day.” High temp → “Sip the sunrise.”

4. How it learned - two phases

Visual. From internet-scale text to a helpful assistant.
  • Pre-training: read trillions of words, just predicting the next token. No labels needed.
  • Post-training (fine-tuning + RLHF): coach it to be helpful, honest, and safe.

5. Scale and parameters

Parameters are the model’s adjustable dials, set during training - modern models have billions. More parameters + more data + more compute → more capability. That is the “scaling” story behind the leaps you’ve seen.

Interactive

Be the model

Set the temperature, then sample the next token - just like an LLM.

The weather today is ___
sunny
60%
cloudy
19%
rainy
11%
warm
6%
cold
3%
unpredictable
1%
banana
0%
Temperature0.70
Focused (consistent)Creative (riskier)

Low temperature makes the bars peaky - it almost always picks “sunny.” Crank it up and the bars flatten, so even “banana” gets a turn.

Recap
  • LLMs work in tokens, predicting the next one from probabilities.
  • Pre-training builds knowledge; post-training (fine-tuning + RLHF) makes it helpful.
  • Scale drives capability; temperature controls randomness.

Finished the lesson?

Mark it complete to track your progress.