When AI Answers differ each time

The AI Feedback Lottery

Large Language Models judge your work differently every time you ask. That is not just a quirk of the technology — it is a fundamental challenge to how we create, validate, and trust ideas.

Imagine you have spent weeks on a business plan. You show it to an AI assistant and it comes back glowing: "This is a compelling, well-structured concept — you are ready to move forward." Energised, you close the chat window and get to work. A few days later, curiosity gets the better of you. You open a fresh conversation, paste in the exact same plan with the exact same prompt, and wait. The verdict this time? "There are significant structural weaknesses here. The market analysis is underdeveloped, the financial projections rest on untested assumptions, and the value proposition needs a substantial rethink."

Same document. Same question. Two completely different realities.

This is not a bug. It is one of the most underappreciated structural truths about working with Large Language Models — and it carries real consequences for anyone using AI as a creative partner, a sounding board, or a validator of their ideas.

LLMs can't give you a consistent opinion

A Large Language Model does not have a stable point of view the way a human colleague does. Each response is generated fresh, shaped by a cascade of probabilistic decisions. Strip away the impressive language and what you are left with is a very sophisticated engine for predicting what a plausible, coherent answer looks like — given the conversation context it currently holds in memory.

When you clear a conversation, that context vanishes entirely. The model has no memory of its previous enthusiasm or its previous doubts. It approaches your work as if for the first time, in a subtly different statistical mood. The same prompt, on a different draw of the same probabilistic deck, may land on caution rather than encouragement — or vice versa.

Session One — Same Prompt

This is an excellent foundation. The core idea is strong, the structure is sound, and the execution is ready for prime time. I would move forward with confidence.

Session Two — Same Prompt

This has potential, but the concept has a number of significant flaws that would need to be addressed before it is viable. I would recommend a substantial rework before proceeding.

Both responses come from the same model. Both are coherent, well-reasoned, and written with conviction. Neither is lying. But they cannot both be right — and yet the model will defend either one with equal fluency if you ask it to.

J.K. Rowling and the luck of the draw

When J.K. Rowling finished the first Harry Potter manuscript, she did not receive one verdict. She received twelve — and every single one of them, from twelve different professional editors at twelve different publishers, was a rejection. The story that would become one of the most successful children book series of the past decades was, by the consensus of expert opinion at the time, not worth publishing.

The editors were not incompetent. They were making genuine, experienced judgments — but those judgments were shaped by the mood of the reader, the publisher's current preferences, the editor's personal sensibility that day, and a thousand other unknown variables. Creative evaluation has always been, at its core, a probabilistic process. AI has simply made that probabilistic chaos instantaneous, frictionless, and ungrounded in reality.

What changed with AI is not the inconsistency. What changed is that we now mistake the confident, well-structured prose of an AI response for a stable, reliable verdict. Rowling's rejection letters at least came with the knowledge that they were human judgments — fallible, contextual, one-of-twelve. An AI response arrives wearing the aesthetic of authority. It's not a friendly "we are sorry we can't publish you" one-pager, but long and dense feedback.

After those rejections, J.K. Rowling sent out her manuscript another, thirteenth time, and it opened the doors to the successful launch of Harry Potter and the Philosopher's Stone in 1997. What would she have done had she submitted her manuscript to a Large Language Model?

Three ways to deal with LLM feedback

When creators come across the inconsistency in AI feedback — there are three ways to respond, and only one is right.

The Deflated Creator

A self-critical person will respond to this kind of AI feedback discrepancy with discouragement. If the AI loved your work this morning and dismissed it this afternoon, what does that say about the work? For many people, especially those who already carry self-doubt about their creative output, a single harsh AI verdict can be enough to shelve an idea entirely. The cruelty here is compounded by the fact that the harsh verdict may have been the statistical outlier — the equivalent of that one editor at one publisher who happened to be having a bad day. But without the context that this was simply one draw from a probabilistic deck, it reads as a definitive judgment, and at the same time devalues the positive opinion received earlier.

The Overconfident Creator

The second response is the mirror image: unfounded confidence. A confident creator shops sessions until they get the enthusiastic verdict they were hoping for, takes a screenshot, and proceeds as though the AI has validated their work. This is not dishonest — it is a perfectly natural psychological response to an uncertain situation. But it is dangerous, because the encouragement carries no more weight than the discouragement. It is simply the lucky draw, not an informed assessment. The creator moves forward with conviction built on sand.

The Sophisticated Creator

The third response is the only genuinely useful one: treating the inconsistency as information in itself. A creator who asks the same AI the same question multiple times — or who deliberately varies the framing to stress-test an idea — is not gaming the system. They are doing something close to what a good editor or creative director does when they seek multiple opinions before committing to a direction. The variance in the responses tells them something real: where the idea is genuinely strong (consistent praise across sessions), and where it is genuinely fragile (inconsistent or conflicting feedback).

This is a skill. It requires understanding what the tool is — and what it is not.

Mistaking Fluency for Stability

The deeper issue is not that AI is inconsistent. Everything creative is inconsistent — human taste, market timing, editorial fashion. The deeper issue is that AI's extraordinary fluency disguises its instability. A response generated by a Large Language Model reads with the confidence of a considered opinion because it has been trained on text written by humans who were expressing considered opinions. The surface features of certainty are all there. The underlying stability is not.

This creates a particular trap for creators who are using AI precisely because they want an outside perspective — an honest, disinterested voice that will tell them whether their work is good. The AI performs that role convincingly. But the performance is not the reality.

Rowling's editors, for all their inconsistency, had one thing the AI cannot offer: genuine stakes. Each rejection was from a real person whose professional reputation was on the line, whose years of industry experience shaped the judgment, and who was embedded in a specific market context. Their inconsistency was costly and slow. AI's inconsistency is free and instant — and that changes how seriously we take it, almost always in the wrong direction.

How to use AI as a creative partner?

None of this means AI feedback is worthless. A single AI session can surface blind spots, identify unclear passages, suggest alternatives you had not considered, and push you toward a more refined version of your idea. That is genuinely valuable. The mistake is treating any single session as a verdict rather than as one perspective in an ongoing conversation.

A few principles follow naturally from taking the inconsistency seriously. Treat any single AI response as one data point, not a conclusion. Seek multiple sessions with varied framing before drawing any inference about the quality of your work. Pay particular attention to what is consistently flagged across sessions — that is where the real weaknesses live. And preserve your own judgment as the final integrating function. The AI can tell you many things about your work. It cannot tell you whether your work matters, or to whom, or why — and that is still, as it has always been, entirely yours to determine.

About this post

This post explores a structural characteristic of Large Language Models and its practical implications for creators, entrepreneurs, and anyone using AI as a thinking partner.

Key takeaways

If you want to use AI for feedback, do it with caution and don't take everything it says for granted. And do multiple runs as if you would ask multiple people.

The original ideas, structure, and much of the language are human-created, but AI was used to develop, enrich, or rework portions of the content — for example, researching sources, rewriting sections for clarity, or expanding on arguments - please click for more information