The Boundaries

Where LLMs shine vs. where they fail, why hallucinations happen, and how to mitigate.

Ask an AI for a source and it may invent one - perfectly formatted, completely fake.

A real case

Lawyers filed a brief citing court cases an AI generated. The cases didn’t exist. They were sanctioned. The model wasn’t lying on purpose - it was doing exactly what it does: producing plausible-sounding text.

What LLMs can and can’t do

The big picture: LLMs are brilliant at working with language, and shaky whenever an answer has to be exactly, verifiably true.

Green zone - use freely

Drafting & rewriting - turn 5 messy bullets into a polished client email.
Summarizing - condense a 40-message thread into 5 takeaways.
Explaining simply - 'explain a mortgage to a 12-year-old.'
Brainstorming - 20 product name ideas.
Translating & reformatting - notes into a clean table.
Working with text you give it - pull action items from a transcript.

Red zone - verify

Exact facts from memory - 'what was our Q3 revenue?'
Fresh or real-time info - 'what's in the news today?'
Precise math & counting - 'how many r's in strawberry?'
Your private/internal data - it doesn't know your wiki or CRM.
Strict multi-step logic - a tricky scheduling puzzle in one shot.
Knowing its own limits - it answers anyway instead of 'I don't know.'

The pattern: language tasks = use freely (and skim). Exact-truth tasks = verify, or give it the facts.

Why these limits exist

1. Hallucination

Hallucination happens because the model predicts plausible tokens. When it lacks a fact, it fills the gap with something that looks right - there’s no built-in “I checked this” step. Highest risk: niche or recent topics, exact numbers, names, citations.

2. Bias

Trained on human-written internet text, so it absorbs human Bias and over-represents majority views. Ask it to “describe a nurse and a CEO” and, unprompted, it may default the nurse to “she” and the CEO to “he.”

3. Knowledge cutoff & no live data

It only “knows” up to its Knowledge cutoff, and can’t see today’s news or your private documents unless connected to tools/search (Level 8, RAG).

4. Weak at exact math & strict logic

It’s matching language patterns, not running a calculator, so it can fumble arithmetic or “count the letters” tasks - unless it works step by step or uses tools.

5. Context limits

It can only hold so much text at once; very long inputs get truncated or key details get lost in the middle. Its working memory is the context window.

6. AI slop

Mass, low-effort generated content. Beyond being low value, AI slop pollutes the web - and future training data - creating a feedback loop.

Confidence is not correctness

The confidence needle can pin high while the answer is wrong. Read a confident tone as style, not proof.

How to mitigate

Verify anything that matters against a trusted source (Lesson 2.6 teaches the technique).
Give it the facts - paste the source or use retrieval - instead of trusting its memory.
For logic/math, ask for steps or use a reasoning model/tool.
Read confident tone as style, not proof.

Interactive

Spot the hallucination

Round 1 of 2 - flag the fabricated answer.

Which answer about remote-work productivity is fabricated?

Recap

LLMs predict plausible text, so they can hallucinate, carry bias, and miss recent or private facts.
By default they're weak at exact math/counting and limited by context size.
Verify what matters, supply the facts, and never mistake confidence for correctness.