Your "AI User" Is a Fiction. Make It a Better One.
LLM-based user simulations are only as good as the personas behind them. Most teams get this badly wrong, here's how to get it right.
There's a seductive shortcut hiding inside most LLM user simulations: the assumption that “an AI user” means one user. A sort of composite, averaged-out visitor who clicks through your flows like a polite focus group participant. They don't have strong feelings. They don't abandon carts because your shipping estimate looked sketchy. They don't rage-quit when you ask them to create an account for the fourth time.
Real users do all of those things. And they do them differently from each other, based on who they are, what they're trying to accomplish, and how much patience they brought to your site today. If your simulation doesn't encode that variation, you're not testing your product, you're testing an idealized version of it that doesn't exist.
Persona-driven LLM simulations are the fix. But “persona-driven” is doing a lot of work in that sentence, and most implementations are shallower than they need to be. Here's what actually matters.
“You're not testing your product, you're testing an idealized version of it that doesn't exist.”
What a Synthetic Persona Actually Is
A synthetic persona isn't a name and a stock photo. It's a structured profile that changes how an LLM agent makes decisions; what it clicks, where it hesitates, when it decides the page isn't worth its time. Think of it as conditioning the agent's entire decision policy, not just its tone of voice.
The components that actually move the needle are less obvious than demographics. Yes, you need role and context (a commuting mobile shopper behaves differently than someone at a desktop with two monitors). But the attributes that drive the most interesting behavioral divergence are the psychographic ones: risk tolerance, trust level, patience, how much cognitive load this person walked in with. Research frameworks like SimUSER and Customer-R1 show that conditioning on explicit persona attributes produces action distributions that actually resemble real users, not just plausible-sounding journeys. [1]
Goals and constraints matter enormously, and they're chronically underspecified. “Compare prices quickly” and “avoid creating an account” are not the same goal as “find the cheapest option”. The first persona will bail the moment your filtering UX gets complicated; the second will tolerate more friction to reach the number they need. Build in those tensions explicitly. Add remembered frustrations. Give your persona a history with your category. [2]
How Personas Drive Different Behavior on the Same Screen
Here's the thing that makes this approach genuinely useful for UX work: once you have well-specified personas, you can run them through identical flows and watch them break on completely different steps.
Take a checkout flow. A convenience-first persona, high trust, decisive, not price-sensitive that sails past your product page and goes straight to “Buy Now.” A bargain-hunter persona dives into filters, reads three reviews, opens a coupon tab, and abandons when they can't figure out whether the discount applies before or after tax. A privacy-anxious persona gets to account creation and leaves immediately. Same flow. Three distinct drop-off points. Three distinct fixes.
Customer-R1, a recent LLM agent framework for online shopping simulation, feeds each agent both the HTML of the current page and an explicit persona block, then asks it to generate a rationale and a concrete next action (click, type, or exit) at every step. Conditioning on persona significantly improves next-action prediction and produces journeys that more closely match real individual session data. [3]
The practical implication: the rationale step is where the value is. It tells you why a persona hesitates, which is the thing you actually need to fix.
This is exactly what happens with real heterogeneous traffic, and it's what generic user simulations completely miss. The friction on your permissions screen isn't universal. It's catastrophic for the low-trust, time-poor persona and invisible to the power user who's been through worse onboarding flows and survived. [4]
Where Good Personas Come From
This is where most teams cut corners and pay for it later. Synthetic personas invented from intuition tend to cluster around a few comfortable archetypes, the Power User, the Confused Grandparent, the Security-Conscious IT Admin, that feel representative but aren't grounded in your actual user base. They reproduce the biases of whoever wrote them, not the distribution of your real traffic. [5]
The better approach is to work backwards from data you already have. Cluster your analytics around behavioral dimensions, path variability, feature usage, drop-off patterns, purchase frequency, and let the clusters tell you what the meaningful segments actually are. Then use interview transcripts and survey data to give each cluster its motivations, language, and mental models. SimUSER does this automatically for recommender system data, inferring self-consistent personas from historical interactions and enriching them with personality traits and behavioral tendencies. [6]
A Practical Cast of Characters
For most product teams, you don't need a hundred personas; you need three to five high-leverage ones per critical journey, designed to stress-test the assumptions your flow is built on. Here's a starting cast for three common flows:
Onboarding
The Anxious Newcomer
New to the category. Low tech confidence. High fear of making an irreversible mistake. Reads every tooltip. Abandons at any form that asks for more than a name and email.
Checkout
The Suspicious Shopper
Price-sensitive. Distrustful of hidden fees. On mobile with one hand. Already comparison-shopping in another tab. Leaves at the first sign of forced account creation.
Enterprise / Admin
The Overloaded Manager
High security awareness. No time to read documentation. Needs every action to be auditable and reversible. Will escalate to IT rather than guess.
The point isn't that these are the right personas for your product, it's that each one encodes a distinct tolerance profile that will break at a different place in your flow. Run them all, compare the breakpoints, and you have a prioritized friction map. [2]
The Ways This Goes Wrong
Persona-based simulation is easy to do in a way that makes you feel productive without being useful. A few failure modes worth naming explicitly:
The vanity persona. Richly described, psychologically plausible, and entirely divorced from your actual user distribution. If your median user is a 34-year-old mid-market operations manager and your simulation only includes power users and total novices, your results tell you nothing about your real funnel. Anchor persona distributions to your actual data. [5]
The unvalidated simulation. Running synthetic personas through a flow without ever checking whether the behavior resembles real sessions. Do the synthetic journeys produce roughly the same success rates, click distributions, and drop-off reasons as your analytics? If not, your personas are wrong, not your product. Benchmark against humans before you trust the synthetic signals. [2]
The stale persona. Personas built on last year's research, treated as permanent fixtures. Your users change. Your market changes. A persona that accurately represented your early adopters may be useless for describing your current cohort. Build a process for updating them. [8]
Where Human Research Still Wins
None of this replaces talking to real people. It changes when you do it and what you ask.
Early in a design cycle, synthetic personas are excellent for pressure-testing flows cheaply, catching the obvious cliffs before you've spent time on visual polish, identifying which hypotheses are worth testing at all. They're available at 11pm. They don't get tired. You can run fifty variants in an afternoon. [1]
But as you approach high-stakes decisions, pricing pages, paywall design, major information architecture changes, you still want moderated sessions and live experiments. Synthetic personas can tell you where a flow might break for a certain kind of person; they can't tell you whether your hypothesis about why it breaks is actually correct. That still requires a real person explaining their reasoning in their own words. [9]
The most useful framing: think of persona-conditioned LLMs as a permanent testing panel you can always call on. The rushed shopper, the skeptical IT lead, the confused first-timer, they walk your flows before your customers do. Their job is to make sure that by the time real people arrive, your biggest friction points are already visible and already on a roadmap. Not to replace the real people. To make the time you spend with real people count for more.
References
- Wang, Z. et al. (2025). SimUSER / Customer-R1 persona conditioning research. huggingface.co
- Wang, Z., Lu, Y., Zhang, Y., Huang, J., & Wang, D. (2025). Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping. arXiv:2510.07230
- The Moonlight review of Customer-R1. themoonlight.io
- Nielsen Norman Group. Evaluating AI-Simulated Behavior: Insights from Three Studies. nngroup.com
- Anonymous. LLM Generated Persona is a Promise with a Catch. (2025). arXiv:2503.16527
- Bougie, N. et al. (2025). SimUSER: Simulating User Behavior with LLMs for Recommender Systems. nicolas99-9.github.io
- OPeRA shopping dataset. scholar.google.com
- Emergent Mind. User Simulator Behavior: Mechanisms & Metrics. emergentmind.com
- Nielsen Norman Group. Synthetic Users. nngroup.com
