How AI Content Detection Actually Works (And Why It's Mostly Guesswork)

The simple version: AI detectors don't detect AI. They detect patterns that are statistically common in AI-generated text. This distinction matters because it's the difference between forensic evidence and educated guessing.

If you're using AI detectors to screen job applications, grade student essays, or audit freelance writing, you need to understand what these tools actually measure — and how often they're wrong.

The Simple Version

AI detectors use three main signals:

  • Classifier models — machine learning models trained on thousands of labeled human and AI text samples. They learn subtle patterns in phrasing, transition words, and punctuation usage.

A detector combines these scores and outputs a percentage: "87% likely AI-generated."

How It Actually Works

Behind the simple percentage is a more complex process.

Perplexity scoring works by running the text through a language model (often a smaller variant of GPT or a specialized model) and measuring how confidently it predicts each next word. Low perplexity = predictable text = flagged as AI. The problem: technical writing, legal documents, and standardized reports also have low perplexity. They're human-written but structurally predictable.

Burstiness analysis measures the standard deviation of sentence lengths and grammatical complexity. AI tends toward the mean. Humans vary more. The problem: edited professional writing is often de-burstified by editors. A New Yorker article may score as "AI" because the editing process smoothed out the natural variation.

Classifier models are the most opaque. They're trained on datasets of known AI and human text, but the training data is often biased toward specific models (early GPT-3.5 output, for example) and specific domains (student essays, marketing copy). A classifier trained on GPT-3.5 essays may fail completely on Claude 3.7 technical documentation.

Why Everyone Gets This Wrong

Myth 1: "98% accuracy" means 98% of flags are correct.

This is almost always base rate neglect. If 5% of submitted text is actually AI-generated, and a detector is 98% accurate, you'll still flag 2% of human text incorrectly. With 1,000 submissions, that's 19 false positives — nearly as many as the 49 true positives.

Myth 2: A low score means the text is human-written.

Detectors are calibrated to flag AI text. They're not calibrated to prove human authorship. A score of "15% AI" just means the detector didn't see familiar AI patterns. It doesn't mean a human wrote it.

Myth 3: Multiple detectors confirming the same result increases confidence.

Most commercial detectors use similar underlying models and training data. Agreement between detectors often reflects shared bias, not independent confirmation.

The Catch (What's Still Hard)

The arms race makes detection obsolete quickly. Every new model (GPT-4o, Claude 3.7, Gemini 2.5) changes the statistical fingerprint. Detectors trained on last year's output gradually lose accuracy. OpenAI shut down its own classifier in 2023 because it couldn't keep up.

Mixed content is nearly impossible to classify correctly. A human writes an outline, AI expands each section, then a human edits heavily. The final text contains both human and AI statistical signatures. Current detectors either flag it as AI (false positive) or clear it as human (false negative). Neither is correct.

Adversarial prompting defeats most detectors. Adding spelling errors, inconsistent capitalization, or inserting idiomatic phrases can drop a "95% AI" score to "30% AI" without changing the substance. This is trivial to do and makes detection meaningless for anyone trying to evade it.

What's Still Hard

  • The ethics of detection are unresolved. Using detectors for hiring, academic grading, or content auditing without understanding their limitations creates real harm from false accusations.

Related reading

The Bottom Line

This isn't a future possibility—it's happening now for organizations that moved early. The question isn't whether this technology will reshape your workflows. It's whether your team will be leading that change or reacting to competitors who did.