Press & Media

The story

Everything you need to cover GuessTheModel.

The one-liner

"A blind taste test for AI — and most people can't tell which model they're reading."

What it is

GuessTheModel is the social human benchmark for AI. Every day, the same prompt is sent to five leading AI models — Claude, ChatGPT, Gemini, Grok, and Perplexity — under identical conditions. The outputs are displayed with no model names attached, in a shuffled order that is identical for every visitor — the daily puzzle is shared, like a crossword.

Players see one answer at a time and tap which AI they think wrote it — instant reveal after every tap, five rounds, score out of five. The finale shows how their score compares to everyone else's that day, plus which answer the crowd crowned best.

The result is a continuously updated, crowd-sourced benchmark that no AI company controls, funds, or influences.

Why it matters

Most AI benchmarks are created by the AI companies themselves, run on synthetic test sets, or require technical expertise to interpret. GuessTheModel is different:

  • No brand names until after you vote — brand bias is structurally eliminated
  • Real prompts, real tasks — not capability tests, but the things people actually use AI for
  • Crowd consensus at scale — one reviewer's taste doesn't set the ranking
  • Ongoing — new battle every day, rankings update in real time

Story angles

"The blind taste test"

Most people cannot reliably identify which AI wrote a response. The gap between brand perception and actual preference is measurable — and surprising.

"The benchmark nobody owns"

While AI labs race to top leaderboards they helped design, a crowd-sourced alternative is building a different kind of evidence.

"Can you tell which AI wrote this?"

An interactive piece — send reporters the link and let them vote before they write. The wrong-guess rate makes the story.

"The AI personality test"

Ask all 5 models the same controversial question. The differences in tone, hedging, and confidence reveal something real about each model's character.

Facts for your article

  • 5 models compared per battle: Claude (Anthropic), ChatGPT (OpenAI), Gemini (Google), Grok (xAI), Perplexity
  • Same prompt, same temperature (0.7), same max token limit for all models
  • Display order is randomised per visitor — position bias is structurally removed
  • One vote per person per battle, tracked by anonymous browser fingerprint
  • New battle published every day
  • All aggregate vote data is public on the leaderboard

Contact

For press enquiries, data access, or an embargo briefing:

mark@guessthemodel.com

We're happy to provide raw vote data, schedule a briefing, or arrange early access to upcoming battle results for reporters working on a story.