Ask an AI for a source and it may invent one - perfectly formatted, completely fake.
Lawyers filed a brief citing court cases an AI generated. The cases didn’t exist. They were sanctioned. The model wasn’t lying on purpose - it was doing exactly what it does: producing plausible-sounding text.
What LLMs can and can’t do
The big picture: LLMs are brilliant at working with language, and shaky whenever an answer has to be exactly, verifiably true.
Green zone - use freely
- Drafting & rewriting - turn 5 messy bullets into a polished client email.
- Summarizing - condense a 40-message thread into 5 takeaways.
- Explaining simply - 'explain a mortgage to a 12-year-old.'
- Brainstorming - 20 product name ideas.
- Translating & reformatting - notes into a clean table.
- Working with text you give it - pull action items from a transcript.
Red zone - verify
- Exact facts from memory - 'what was our Q3 revenue?'
- Fresh or real-time info - 'what's in the news today?'
- Precise math & counting - 'how many r's in strawberry?'
- Your private/internal data - it doesn't know your wiki or CRM.
- Strict multi-step logic - a tricky scheduling puzzle in one shot.
- Knowing its own limits - it answers anyway instead of 'I don't know.'
The pattern: language tasks = use freely (and skim). Exact-truth tasks = verify, or give it the facts.
Why these limits exist
1. Hallucination
Hallucination happens because the model predicts plausible tokens. When it lacks a fact, it fills the gap with something that looks right - there’s no built-in “I checked this” step. Highest risk: niche or recent topics, exact numbers, names, citations.
2. Bias
Trained on human-written internet text, so it absorbs human Bias and over-represents majority views. Ask it to “describe a nurse and a CEO” and, unprompted, it may default the nurse to “she” and the CEO to “he.”
3. Knowledge cutoff & no live data
It only “knows” up to its Knowledge cutoff, and can’t see today’s news or your private documents unless connected to tools/search (Level 8, RAG).
4. Weak at exact math & strict logic
It’s matching language patterns, not running a calculator, so it can fumble arithmetic or “count the letters” tasks - unless it works step by step or uses tools.
5. Context limits
It can only hold so much text at once; very long inputs get truncated or key details get lost in the middle. Its working memory is the context window.
6. AI slop
Mass, low-effort generated content. Beyond being low value, AI slop pollutes the web - and future training data - creating a feedback loop.
Confidence is not correctness
The confidence needle can pin high while the answer is wrong. Read a confident tone as style, not proof.
How to mitigate
- Verify anything that matters against a trusted source (Lesson 2.6 teaches the technique).
- Give it the facts - paste the source or use retrieval - instead of trusting its memory.
- For logic/math, ask for steps or use a reasoning model/tool.
- Read confident tone as style, not proof.
Spot the hallucination
Round 1 of 2 - flag the fabricated answer.
Which answer about remote-work productivity is fabricated?
- LLMs predict plausible text, so they can hallucinate, carry bias, and miss recent or private facts.
- By default they're weak at exact math/counting and limited by context size.
- Verify what matters, supply the facts, and never mistake confidence for correctness.

