BLOG — AI & HUMAN CREATIVITY

The AI Feedback Lottery:
When the Same Prompt Gets a Different Answer

Large Language Models judge your work differently every time you ask. That is not just a quirk of the technology — it is a fundamental challenge to how we create, validate, and trust ideas.

9SENSES.AI — MARCH 2026 | 7 MIN READ

“Imagine you have spent weeks on a business plan. You show it to an AI assistant and it comes back glowing: This is a compelling, well-structured concept — you are ready to move forward. Energised, you close the chat window. A few days later, you open a fresh conversation, paste in the exact same plan with the exact same prompt, and wait. The verdict: There are significant structural weaknesses here. The market analysis is underdeveloped, and the value proposition needs a substantial rethink.”

Same document. Same question. Two completely different realities.

This is not a bug report or a complaint. It is one of the most underappreciated structural truths about working with Large Language Models — and it carries real consequences for anyone using AI as a creative partner, a sounding board, or a validator of their ideas.

Why LLMs Cannot Give You a Consistent Opinion

A Large Language Model does not have a stable point of view the way a human colleague does. Each response is generated fresh, shaped by a cascade of probabilistic decisions. Strip away the impressive language and what you are left with is a very sophisticated engine for predicting what a plausible, coherent answer looks like — given the conversation context it currently holds in memory.

When you clear a conversation, that context vanishes entirely. The model has no memory of its previous enthusiasm or its previous doubts. It approaches your work as if for the first time, in a subtly different statistical mood. The same prompt, on a different draw of the same probabilistic deck, may land on caution rather than encouragement — or vice versa.

SESSION ONE — SAME PROMPT

“This is an excellent foundation. The core idea is strong, the structure is sound, and the execution is ready for prime time. I would move forward with confidence.”

SESSION TWO — SAME PROMPT

“This has potential, but the concept has significant flaws that need addressing before it is viable. I would recommend a substantial rework before proceeding.”

Both responses come from the same model. Both are coherent, well-reasoned, written with conviction. Neither is lying. But they cannot both be right — and yet the model will defend either one with equal fluency if you ask it to.

“The experts disagreed with each other — they just never had to sit in the same room to do it. Rowling’s manuscript was evaluated in isolation, one desk at a time, and the outcomes were wildly inconsistent. Sound familiar?”

J.K. Rowling and the Luck of the Draw

There is something deeply familiar about this dynamic, even if the technology is new. When J.K. Rowling finished the first Harry Potter manuscript, she did not receive one verdict. She received twelve — and every single one of them, from twelve different professional editors at twelve different publishers, was a rejection. The story that would become one of the most beloved book series in history was, by expert consensus, not worth publishing.

The editors were not incompetent. They were making genuine, experienced judgments — but those judgments were shaped by the mood of the reading, the publisher’s current list, the editor’s personal sensibility that day, and a thousand other invisible variables. Creative evaluation has always been, at its core, a probabilistic process. AI has simply made that probabilistic chaos instantaneous, frictionless, and invisible.

What changed with AI is not the inconsistency itself. What changed is that we now mistake the confident, well-structured prose of an AI response for a stable, reliable verdict. Rowling’s rejection letters came with the knowledge that they were human judgments — fallible, contextual, one-of-twelve. An AI response arrives wearing the aesthetic of authority.

Three Ways Creators Respond — and Only One Helps

The Deflated Creator

The first response is discouragement. If the AI loved your work this morning and dismissed it this afternoon, what does that say about the work? For many people, especially those who already carry self-doubt, a single harsh AI verdict can be enough to shelve an idea entirely. The harsh verdict may have been a statistical outlier — the equivalent of that one editor having a bad day — but without that context, it reads as a definitive judgment.

The Overconfident Creator

The second response is the mirror image: unfounded confidence. A creator shops sessions until they get the enthusiastic verdict they were hoping for, takes a screenshot, and proceeds as though the AI has validated their work. The encouragement carries no more weight than the discouragement. It is simply the lucky draw, not an informed assessment.

The Sophisticated Creator

The third response is the only genuinely useful one: treating the inconsistency as information in itself. Deliberately asking the same question multiple times — varying the framing to stress-test an idea — reveals something real: where the idea is genuinely strong (consistent praise), and where it is genuinely fragile (inconsistent or conflicting feedback). This is a skill. It requires understanding what the tool is — and what it is not.

The Real Problem: Mistaking Fluency for Stability

The deeper issue is not that AI is inconsistent. Everything creative is inconsistent — human taste, market timing, editorial fashion. The deeper issue is that AI’s extraordinary fluency disguises its instability. A response generated by a Large Language Model reads with the confidence of a considered opinion because it has been trained on text written by humans who were expressing considered opinions. The surface features of certainty are all there. The underlying stability is not.

Rowling’s editors, for all their inconsistency, had one thing the AI cannot offer: genuine stakes. Each rejection came from a real person whose professional reputation was on the line. Their inconsistency was costly and slow. AI’s inconsistency is free and instant — and that changes how seriously we take it, almost always in the wrong direction.

What This Means for Using AI as a Creative Partner

None of this means AI feedback is worthless. A single session can surface blind spots, identify unclear passages, and suggest alternatives you had not considered. That is genuinely valuable. The mistake is treating any single session as a verdict rather than as one perspective in an ongoing conversation.

Treat any single AI response as one data point, not a conclusion. Seek multiple sessions with varied framing before drawing any inference about the quality of your work. Pay particular attention to what is consistently flagged across sessions — that is where the real weaknesses live. And preserve your own judgment as the final integrating function.

The AI feedback lottery is not a reason to stop asking. It is a reason to ask more carefully, more often, and with a clearer understanding of what you are actually holding in your hands when the answer comes back. Not a verdict. A draw. One more voice in the long, unresolved, beautifully inconsistent conversation that is creative work.

About this post

This post explores a structural characteristic of Large Language Models and its practical implications for creators and entrepreneurs using AI as a thinking partner.

Written with AI assistance (Claude Sonnet 4.6)

Key Takeaway

AI feedback is probabilistic, not stable. Treat every response as one data point — not a conclusion.

Topics

LLM Behaviour • Human Creativity • AI Feedback • Prompt Engineering • AI Literacy

The AI Feedback Lottery:When the Same Prompt Gets a Different Answer