The Problem: A Real Baseline for AI Search Visibility Does Not Exist at the Consumer Level
Every measurement of AI brand visibility runs into the same wall: ChatGPT often does not give the same answer twice, and often does not give the same answer to two different people. The reason is structural, and the implication is that consumer-level measurement of AI search visibility is not possible.
AI search recommendation is not deterministic
ChatGPT may or may not return the same brand list for the same query twice in a row. Independent industry reporting in November 2025 found that "the non-deterministic nature of LLMs means that 40% to 60% of cited sources change monthly."9 The variance is not random noise. It is a function of personalization, fingerprinting, the model's sampling temperature, and the audience context the model has accumulated about the user.
Personalization is happening to every user, on every query
OpenAI rolled out persistent Memory across all ChatGPT consumer plans in 2025, with the system referencing every prior conversation, saved memory, and uploaded file to tailor responses.1 ChatGPT's web search feature personalizes results using stored memory by default.2 The default ChatGPT model was upgraded specifically to be "better at personalization."3 MIT and Penn State researchers documented the effect empirically in February 2026.10,11
The persona ceiling: the deepest problem
Even if a consumer-level clean slate were achievable, the marketer cannot ask the question as their customer. A brand operator is typically not their own Ideal Customer Profile (ICP). A budget-conscious shopper asking about airlines, a Big 4 partner asking about enterprise innovation, an informed athletic buyer asking about running shoes: these are not searches the brand operator can credibly run from their own keyboard. The persona is the missing variable.
Why most measurement tools cannot see this
Most platforms in the AI visibility category combine measurement with active optimization services. When a measurement tool also delivers the fixes, customer-specific weaknesses tend to become the next upsell rather than data the customer is encouraged to confront in public. AiRR is built primarily as a measurement system. The measurement layer is kept separate from any execution path. The persona-specific weaknesses documented in this paper would be difficult to publish from a vendor whose primary revenue source is fixing the problems the measurement just exposed.
Methodology
3,052 brand-level and 8,811 prompt-level unpersonalized API observations across 30 brands in 4 industries (Airlines, Professional Services, Athletic Apparel, Cycling) measured inside ChatGPT from January 23, 2026 through May 18, 2026. All queries are issued through unpersonalized API access. No user account memory, no cookies, no IP-targeted routing manipulation. Each brand-prompt-persona cell is queried multiple times to average over single-call response variance.
The AiRR Score and the 4Ps
The AiRR Score is a composite measurement on the 0 to 100 scale, computed from four constituent dimensions:
- Perception: How the brand is described when mentioned.
- Persistence: Whether the brand's visibility holds consistently across repeated queries.
- Presence: Whether the brand appears in the model's response when the prompt is contextually relevant.
- Prestige: Whether the brand is recommended above competitors.
Persona AiRR Drift (PAD)
A brand-level metric that quantifies how a brand's AI search recommendation changes when the audience persona shifts from the general (overall) baseline to a specific ICP. Reported as PAD Points (the change in score on the 0 to 100 scale) and PAD Percent (the change relative to the overall baseline).
For full methodology, sampling logic, limitations, and statistical reliability discussion, see the PDF.
Findings
Finding 1: Airlines, the Cleanest Persona Reversal
Eight US airlines produce two distinct leaderboards inside ChatGPT, one per audience persona. The brand winning Luxury Seeker loses Budget Conscious, and vice versa. PAD reaches +33.32 points (+152.8%) in the strongest positive case and -39.39 points (-51.6%) in the strongest negative case.
Prompt: "What's the top airline brands in the US?"
Frontier rises from invisible to rank 2
I am a price-sensitive shopper with a household income under $40,000 per year. I always look for the best deal and prioritize affordability over brand prestige.
| Brand | Overall | Budget Conscious | PAD Points | PAD % |
|---|---|---|---|---|
| American Airlines | 76.49 | 40.15 | -36.34 | -47.5% |
| Delta | 76.38 | 36.99 | -39.39 | -51.6% |
| United | 61.76 | 34.59 | -27.17 | -44.0% |
| Southwest | 59.00 | 76.55 | +17.55 | +29.7% |
| Alaska | 54.49 | 30.35 | -24.14 | -44.3% |
| JetBlue | 46.61 | 52.36 | +5.75 | +12.3% |
| Virgin | 32.38 | 17.85 | -14.53 | -44.9% |
| Frontier | 21.80 | 55.12 | +33.32 | +152.8% |
Delta wins, Southwest collapses
I am a high earner aged 30 to 55 with a household income over $150,000 per year. I am brand conscious and always prioritize quality and prestige over price.
| Brand | Overall | Luxury Seeker | PAD Points | PAD % |
|---|---|---|---|---|
| American Airlines | 76.49 | 67.68 | -8.81 | -11.5% |
| Delta | 76.38 | 80.97 | +4.59 | +6.0% |
| United | 61.76 | 55.79 | -5.97 | -9.7% |
| Southwest | 59.00 | 27.84 | -31.16 | -52.8% |
| Alaska | 54.49 | 48.67 | -5.82 | -10.7% |
| JetBlue | 46.61 | 46.75 | +0.14 | +0.3% |
| Virgin | 32.38 | 22.01 | -10.37 | -32.0% |
| Frontier | 21.80 | 17.40 | -4.40 | -20.2% |
Frontier is invisible at the overall layer (rank 8 of 8). ChatGPT typically returns three to five brands per recommendation answer. At rank 8, Frontier garners zero percent share. With Budget Conscious supplied, Frontier rises to rank 2 of 8 and becomes fully visible to its actual buyer.
Finding 2: Professional Services, the KPMG Effect
In professional services, the leaderboard inverts at the top under persona conditioning. The aggregate #2 falls to #6, and the aggregate #6 rises to #1.
Prompt: "What are the top professional services brands in the United States?"
McKinsey collapses. KPMG wins.
I am a leader at KPMG focused on commercializing frontier tech like Quantum, Space, AI, etc.
| Brand | Overall | Big 4 Innovation | PAD Points | PAD % |
|---|---|---|---|---|
| EY | 80.18 | 74.32 | -5.86 | -7.3% |
| McKinsey | 78.90 | 51.17 | -27.73 | -35.1% |
| Accenture | 76.91 | 63.14 | -13.77 | -17.9% |
| Deloitte | 74.62 | 77.79 | +3.17 | +4.2% |
| Bain | 73.52 | 33.12 | -40.40 | -55.0% |
| KPMG | 66.53 | 84.88 | +18.35 | +27.6% |
| PwC | 63.67 | 68.41 | +4.74 | +7.4% |
| Grant Thornton | 42.40 | 20.94 | -21.46 | -50.6% |
| BDO | 32.64 | 20.85 | -11.79 | -36.1% |
KPMG produces a +18.35 PAD (+27.6%) under this persona. McKinsey produces -27.73 PAD (-35.1%). Bain produces -40.40 PAD (-55.0%), the largest absolute PAD in the entire dataset. All three firms have global enterprise innovation practices. ChatGPT does not recommend McKinsey or Bain for that work when the persona is supplied.
Finding 3: Athletic Apparel, the Informed-Audience Penalty
In athletic apparel, recognition with a general audience does not predict recommendation strength with an informed athletic buyer. Lifestyle brands collapse under Athletic & Active conditioning. Asics gains.
Prompt: "What's the top sportswear and athletic apparel industry brands in the world?"
Reebok and Puma collapse. Asics gains.
I am an active person aged 18 to 35 who exercises at least 4 times per week. I am health-conscious with a mid-to-high income and tend to prioritize performance and quality.
| Brand | Overall | Athletic & Active | PAD Points | PAD % |
|---|---|---|---|---|
| Nike | 89.53 | 88.24 | -1.29 | -1.4% |
| Adidas | 80.34 | 77.15 | -3.19 | -4.0% |
| Under Armour | 72.19 | 60.23 | -11.96 | -16.6% |
| Asics | 60.81 | 62.99 | +2.18 | +3.6% |
| Reebok | 57.29 | 20.89 | -36.40 | -63.5% |
| Puma | 52.84 | 34.73 | -18.11 | -34.3% |
Every brand drops except Puma
I am an urban professional aged 22 to 32 with a college degree earning between $45,000 and $80,000 per year. I am career focused and either single or newly in a relationship.
| Brand | Overall | Young Professional | PAD Points | PAD % |
|---|---|---|---|---|
| Nike | 89.53 | 84.07 | -5.46 | -6.1% |
| Adidas | 80.34 | 77.62 | -2.72 | -3.4% |
| Under Armour | 72.19 | 56.28 | -15.91 | -22.0% |
| Asics | 60.81 | 50.11 | -10.70 | -17.6% |
| Reebok | 57.29 | 39.76 | -17.53 | -30.6% |
| Puma | 52.84 | 55.23 | +2.39 | +4.5% |
Reebok and Puma are present in the overall ranking and absent from the answer their actual category buyers see. The Young Professional and the Athletic & Active customer are not the same buyer, and ChatGPT does not recommend the same brand to both.
Finding 4: Cycling, Expert Audiences Reshape the Field
In cycling, the overall AiRR Score overstates most brands' positions with both the Avid Cyclist and Luxury Seeker personas. The Luxury Seeker persona produces the largest swings.
Prompt: "What are the top bicycle manufacturing brands in the United States?"
| Brand | Overall | Avid Cyclist | PAD Points | PAD % |
|---|---|---|---|---|
| Trek | 91.87 | 84.18 | -7.69 | -8.4% |
| Specialized | 81.81 | 78.12 | -3.69 | -4.5% |
| Cannondale | 71.85 | 68.28 | -3.57 | -5.0% |
| Santa Cruz | 65.14 | 67.14 | +2.00 | +3.1% |
| Giant | 59.70 | 55.14 | -4.56 | -7.6% |
| Bianchi | 36.98 | 36.54 | -0.44 | -1.2% |
| Felt | 32.77 | 29.03 | -3.74 | -11.4% |
| Brand | Overall | Luxury Seeker | PAD Points | PAD % |
|---|---|---|---|---|
| Trek | 91.87 | 86.36 | -5.51 | -6.0% |
| Specialized | 81.81 | 74.69 | -7.12 | -8.7% |
| Cannondale | 71.85 | 58.57 | -13.28 | -18.5% |
| Santa Cruz | 65.14 | 60.19 | -4.95 | -7.6% |
| Giant | 59.70 | 31.88 | -27.82 | -46.6% |
| Bianchi | 36.98 | 43.67 | +6.69 | +18.1% |
| Felt | 32.77 | 22.02 | -10.75 | -32.8% |
Specialized, Cannondale, and Giant all lose ground under Luxury Seeker conditioning. Bianchi inverts the pattern, gaining +6.69 (+18.1%) under Luxury Seeker even though its overall score is low. The persona conditions whether a brand's positioning is read as performance, prestige, or neither.
Finding 5: Time-Series Movement
Brand-level AI visibility moves at quarterly speed inside ChatGPT. 20-point composite shifts in 60 to 110 days are visible in this dataset and exceed the sensitivity of standard brand-health trackers. All values are in AiRR Score points on the 0 to 100 scale, measured from the first day of measurement in this dataset to the most recent day.
| Brand · Industry | Change | Days |
|---|---|---|
| Puma · Athletic Apparel | -21.94 | 104 |
| Bianchi · Cycling | -19.66 | 64 |
| Giant · Cycling | -14.27 | 64 |
| JetBlue · Airlines | -13.15 | 69 |
| Accenture · Professional Services | +14.78 | 55 |
| Felt · Cycling | +11.48 | 64 |
A 20-point composite move on a 0 to 100 scale within 60 to 110 days is a structural shift. Public-market analysts following brands like Puma can expect AI visibility to lead reported brand-health metrics by one to two quarters.
The Future of Measurement: Brand × Prompt × Persona
The first generation of AI visibility tools reports a single visibility score per brand. The data in this paper shows that number is an average across audiences the tool did not see. A brand can win the overall score and lose every persona that buys.
The second generation adds prompts. Tools now report visibility at the prompt level. That is closer to useful but incomplete. The same prompt produces different brand recommendations under different personas. Prompt-level visibility without persona conditioning is the same averaging problem at smaller granularity.
The future of measurement is the triple: brand × prompt × persona. The specific brand, the specific prompt, and the specific persona of the buyer asking. Anything less reports the average, and the PAD documented in this paper reaches 40.40 points in a single persona shift.
The CMO Paradox
A Chief Marketing Officer running their own search on their own ChatGPT account is not seeing what their customers see. The ChatGPT account that returns a high ranking for the CMO's brand is automatically baking in the CMO's persona: their job history, prior searches, location, and inferred preferences. The model could be telling the CMO what the model thinks the CMO wants to hear.
Consider a concrete case from the data. McKinsey holds an overall AiRR Score of 78.90 in professional services, ranked #2 of 9 firms. Its CMO opens ChatGPT, runs the category prompt, sees McKinsey near the top, and concludes McKinsey is competitive. McKinsey's ICP for its enterprise innovation practice is a Big 4 partner commercializing frontier tech, asking the question from inside their own firm. That persona scores McKinsey at 51.17, ranked #6. The PAD is -27.73 points (-35.1%). ChatGPT typically returns three to five brands per recommendation answer. The actual buyer sees four brands. McKinsey is not one of them. The CMO believes McKinsey is winning. The customer never sees the brand.
McKinsey is getting zero AI-influenced revenue from its target buyer while the marketing team is reporting #2 in the category.
Implications
For marketers. The aggregate AiRR Score is a starting point, not an endpoint. Persona-level breakdowns reveal a different ChatGPT ranking for every customer segment a brand serves. The marketing team should read the PAD column for the persona that matches their declared ICP and treat that value as the actionable number. A negative PAD on the ICP persona means the brand is being oversold by overall visibility and the marketing team is operating on false confidence.
PAD is also a competitive metric. A brand can read its own PAD on its declared ICP alongside competitors' PAD on the same persona. A brand winning the ICP with +20 PAD while the top three competitors are all negative is the cleanest competitive signal the AI search layer produces.
For investors and analysts. Brand visibility decay inside AI search appears to run faster than traditional brand-health indicators detect. Puma's 22-point decline over 104 days, the steepest in the dataset, is the kind of signal that public-market analysts will eventually price in.
For the GEO and AEO category. Tools that report a single brand-level visibility score are averaging across personas they do not measure. The fix is not abandoning measurement. It is measuring the structural variables (persona, prompt context, and time) that operators need to act on.
Quantified economic exposure. Paper 03 of the AiRR Research Series, What a Position is Worth, values Position 1 in a category-defining AI search query at $541,719 in annual influenced revenue, falling to effectively zero by Position 5. A brand holding Position 1 in the aggregate while sitting at Position 5 in the persona where its actual customers live is losing the full delta on that segment. For an eight-segment business with persona-reversal exposure across half its segments, unrecognized revenue at risk runs into seven figures annually.
The AiRR Persona Reversal Index
The data in this paper is the inaugural release of an ongoing publication. The AiRR Persona Reversal Index, derived from the methodology in Section II, will be updated on a recurring basis as the dataset expands across additional models, geographies, and industries. The index is intended for citation, reuse, and reproduction by independent journalists, analysts, and academic researchers. Methodology disclosures, raw data samples, and column definitions are available on request from steven@airrscore.com.
The most-cited measurement in marketing is about to change
For two decades the dominant question of brand search ranking was showing up on the first page of Google on a list of ten blue links. Marketing teams optimized against it, paid for it, and reported it to their boards. The 0 to 100 AiRR Score, as a standalone aggregate, is the AI-search equivalent of that ranking. It is useful as a summary. It is dangerous as a strategy.
The data in this paper documents Persona AiRR Drift of up to 40.40 points within a single brand when the audience persona shifts. Leaderboards invert at the top in professional services. Lifestyle athletic brands collapse with informed buyers. Cycling brands lose ground when the audience becomes a prestige buyer. A single overall AiRR Score reports the average of audiences the brand does not see, and the PAD can be off by more than 40 points on a 0 to 100 scale.
The unit of analysis must change. AI search recommendation is not a brand property. It is a property of the brand, the prompt, and the audience persona. The three together. Persona AiRR Drift is the measurement that makes the persona layer visible to marketing teams who would otherwise be operating on false confidence.
For independent measurement to mean anything, it has to be willing to publish what brands do not want to see. KPMG winning a buyer that the marketing team at McKinsey does not yet know it has lost. Frontier emerging in a persona where the overall score made it invisible. Puma collapsing while its lifestyle marketing continues. These observations are uncomfortable for the brands named. They are also where the value of the measurement lives.
What's your AiRR score?
Calculate your actual PAD
The AiRR Score platform tells you your position on every ICP that buys your product. Know your S. Know your PAD. Calculate your real number.
Perlman, S. (2026). The Persona Reversal: Why AI Brand Visibility Is Not About Your Brand. AiRR Research Series, Paper 04. AI Reach Rank Inc. https://doi.org/10.5281/zenodo.20288041
Get the next paper first
New research publishes quarterly. One email per paper, no marketing noise.