The Persona Reversal

Section I

The Problem: A Real Baseline for AI Search Visibility Does Not Exist at the Consumer Level

Every measurement of AI brand visibility runs into the same wall: ChatGPT often does not give the same answer twice, and often does not give the same answer to two different people. The reason is structural, and the implication is that consumer-level measurement of AI search visibility is not possible.

AI search recommendation is not deterministic

ChatGPT may or may not return the same brand list for the same query twice in a row. Independent industry reporting in November 2025 found that "the non-deterministic nature of LLMs means that 40% to 60% of cited sources change monthly."⁹ The variance is not random noise. It is a function of personalization, fingerprinting, the model's sampling temperature, and the audience context the model has accumulated about the user.

Personalization is happening to every user, on every query

OpenAI rolled out persistent Memory across all ChatGPT consumer plans in 2025, with the system referencing every prior conversation, saved memory, and uploaded file to tailor responses.¹ ChatGPT's web search feature personalizes results using stored memory by default.² The default ChatGPT model was upgraded specifically to be "better at personalization."³ MIT and Penn State researchers documented the effect empirically in February 2026.^10,11

The persona ceiling: the deepest problem

Even if a consumer-level clean slate were achievable, the marketer cannot ask the question as their customer. A brand operator is typically not their own Ideal Customer Profile (ICP). A budget-conscious shopper asking about airlines, a Big 4 partner asking about enterprise innovation, an informed athletic buyer asking about running shoes: these are not searches the brand operator can credibly run from their own keyboard. The persona is the missing variable.

Why most measurement tools cannot see this

Most platforms in the AI visibility category combine measurement with active optimization services. When a measurement tool also delivers the fixes, customer-specific weaknesses tend to become the next upsell rather than data the customer is encouraged to confront in public. AiRR is built primarily as a measurement system. The measurement layer is kept separate from any execution path. The persona-specific weaknesses documented in this paper would be difficult to publish from a vendor whose primary revenue source is fixing the problems the measurement just exposed.

Across the dataset, AI search recommendations for a single brand shift by up to 40.40 points on a 0 to 100 scale when the audience persona changes from the overall baseline to a specific ICP.

Section II

Methodology

3,052 brand-level and 8,811 prompt-level unpersonalized API observations across 30 brands in 4 industries (Airlines, Professional Services, Athletic Apparel, Cycling) measured inside ChatGPT from January 23, 2026 through May 18, 2026. All queries are issued through unpersonalized API access. No user account memory, no cookies, no IP-targeted routing manipulation. Each brand-prompt-persona cell is queried multiple times to average over single-call response variance.

The AiRR Score and the 4Ps

The AiRR Score is a composite measurement on the 0 to 100 scale, computed from four constituent dimensions:

Perception: How the brand is described when mentioned.
Persistence: Whether the brand's visibility holds consistently across repeated queries.
Presence: Whether the brand appears in the model's response when the prompt is contextually relevant.
Prestige: Whether the brand is recommended above competitors.

Persona AiRR Drift (PAD)

A brand-level metric that quantifies how a brand's AI search recommendation changes when the audience persona shifts from the general (overall) baseline to a specific ICP. Reported as PAD Points (the change in score on the 0 to 100 scale) and PAD Percent (the change relative to the overall baseline).

For full methodology, sampling logic, limitations, and statistical reliability discussion, see the PDF.

Section III

Findings

Finding 1: Airlines, the Cleanest Persona Reversal

Eight US airlines produce two distinct leaderboards inside ChatGPT, one per audience persona. The brand winning Luxury Seeker loses Budget Conscious, and vice versa. PAD reaches +33.32 points (+152.8%) in the strongest positive case and -39.39 points (-51.6%) in the strongest negative case.

Prompt: "What's the top airline brands in the US?"

Persona · Budget Conscious

Frontier rises from invisible to rank 2

I am a price-sensitive shopper with a household income under $40,000 per year. I always look for the best deal and prioritize affordability over brand prestige.

Finding 1A · Airlines, Budget Conscious as ICP

Brand	Overall	Budget Conscious	PAD Points	PAD %
American Airlines	76.49	40.15	-36.34	-47.5%
Delta	76.38	36.99	-39.39	-51.6%
United	61.76	34.59	-27.17	-44.0%
Southwest	59.00	76.55	+17.55	+29.7%
Alaska	54.49	30.35	-24.14	-44.3%
JetBlue	46.61	52.36	+5.75	+12.3%
Virgin	32.38	17.85	-14.53	-44.9%
Frontier	21.80	55.12	+33.32	+152.8%

Persona · Luxury Seeker

Delta wins, Southwest collapses

I am a high earner aged 30 to 55 with a household income over $150,000 per year. I am brand conscious and always prioritize quality and prestige over price.

Finding 1B · Airlines, Luxury Seeker as ICP

Brand	Overall	Luxury Seeker	PAD Points	PAD %
American Airlines	76.49	67.68	-8.81	-11.5%
Delta	76.38	80.97	+4.59	+6.0%
United	61.76	55.79	-5.97	-9.7%
Southwest	59.00	27.84	-31.16	-52.8%
Alaska	54.49	48.67	-5.82	-10.7%
JetBlue	46.61	46.75	+0.14	+0.3%
Virgin	32.38	22.01	-10.37	-32.0%
Frontier	21.80	17.40	-4.40	-20.2%

Frontier is invisible at the overall layer (rank 8 of 8). ChatGPT typically returns three to five brands per recommendation answer. At rank 8, Frontier garners zero percent share. With Budget Conscious supplied, Frontier rises to rank 2 of 8 and becomes fully visible to its actual buyer.

Finding 2: Professional Services, the KPMG Effect

In professional services, the leaderboard inverts at the top under persona conditioning. The aggregate #2 falls to #6, and the aggregate #6 rises to #1.

Prompt: "What are the top professional services brands in the United States?"

Persona · Enterprise Innovation at Big 4

McKinsey collapses. KPMG wins.

I am a leader at KPMG focused on commercializing frontier tech like Quantum, Space, AI, etc.

Finding 2 · Professional Services, Enterprise Innovation as ICP

Brand	Overall	Big 4 Innovation	PAD Points	PAD %
EY	80.18	74.32	-5.86	-7.3%
McKinsey	78.90	51.17	-27.73	-35.1%
Accenture	76.91	63.14	-13.77	-17.9%
Deloitte	74.62	77.79	+3.17	+4.2%
Bain	73.52	33.12	-40.40	-55.0%
KPMG	66.53	84.88	+18.35	+27.6%
PwC	63.67	68.41	+4.74	+7.4%
Grant Thornton	42.40	20.94	-21.46	-50.6%
BDO	32.64	20.85	-11.79	-36.1%

KPMG produces a +18.35 PAD (+27.6%) under this persona. McKinsey produces -27.73 PAD (-35.1%). Bain produces -40.40 PAD (-55.0%), the largest absolute PAD in the entire dataset. All three firms have global enterprise innovation practices. ChatGPT does not recommend McKinsey or Bain for that work when the persona is supplied.

Marketing teams at McKinsey and Bain optimizing against an aggregate AI visibility score are optimizing for a customer who does not exist. The audience that buys enterprise innovation consulting is asking ChatGPT with persona context. KPMG is winning that audience by 18 points and 28 percent above its overall baseline.

Finding 3: Athletic Apparel, the Informed-Audience Penalty

In athletic apparel, recognition with a general audience does not predict recommendation strength with an informed athletic buyer. Lifestyle brands collapse under Athletic & Active conditioning. Asics gains.

Prompt: "What's the top sportswear and athletic apparel industry brands in the world?"

Persona · Athletic & Active

Reebok and Puma collapse. Asics gains.

I am an active person aged 18 to 35 who exercises at least 4 times per week. I am health-conscious with a mid-to-high income and tend to prioritize performance and quality.

Finding 3A · Athletic Apparel, Athletic & Active as ICP

Brand	Overall	Athletic & Active	PAD Points	PAD %
Nike	89.53	88.24	-1.29	-1.4%
Adidas	80.34	77.15	-3.19	-4.0%
Under Armour	72.19	60.23	-11.96	-16.6%
Asics	60.81	62.99	+2.18	+3.6%
Reebok	57.29	20.89	-36.40	-63.5%
Puma	52.84	34.73	-18.11	-34.3%

Persona · Young Professional

Every brand drops except Puma

I am an urban professional aged 22 to 32 with a college degree earning between $45,000 and $80,000 per year. I am career focused and either single or newly in a relationship.

Finding 3B · Athletic Apparel, Young Professional as ICP

Brand	Overall	Young Professional	PAD Points	PAD %
Nike	89.53	84.07	-5.46	-6.1%
Adidas	80.34	77.62	-2.72	-3.4%
Under Armour	72.19	56.28	-15.91	-22.0%
Asics	60.81	50.11	-10.70	-17.6%
Reebok	57.29	39.76	-17.53	-30.6%
Puma	52.84	55.23	+2.39	+4.5%

Reebok and Puma are present in the overall ranking and absent from the answer their actual category buyers see. The Young Professional and the Athletic & Active customer are not the same buyer, and ChatGPT does not recommend the same brand to both.

Finding 4: Cycling, Expert Audiences Reshape the Field

In cycling, the overall AiRR Score overstates most brands' positions with both the Avid Cyclist and Luxury Seeker personas. The Luxury Seeker persona produces the largest swings.

Prompt: "What are the top bicycle manufacturing brands in the United States?"

Finding 4A · Cycling, Avid Cyclist as ICP

Brand	Overall	Avid Cyclist	PAD Points	PAD %
Trek	91.87	84.18	-7.69	-8.4%
Specialized	81.81	78.12	-3.69	-4.5%
Cannondale	71.85	68.28	-3.57	-5.0%
Santa Cruz	65.14	67.14	+2.00	+3.1%
Giant	59.70	55.14	-4.56	-7.6%
Bianchi	36.98	36.54	-0.44	-1.2%
Felt	32.77	29.03	-3.74	-11.4%

Finding 4B · Cycling, Luxury Seeker as ICP

Brand	Overall	Luxury Seeker	PAD Points	PAD %
Trek	91.87	86.36	-5.51	-6.0%
Specialized	81.81	74.69	-7.12	-8.7%
Cannondale	71.85	58.57	-13.28	-18.5%
Santa Cruz	65.14	60.19	-4.95	-7.6%
Giant	59.70	31.88	-27.82	-46.6%
Bianchi	36.98	43.67	+6.69	+18.1%
Felt	32.77	22.02	-10.75	-32.8%

Specialized, Cannondale, and Giant all lose ground under Luxury Seeker conditioning. Bianchi inverts the pattern, gaining +6.69 (+18.1%) under Luxury Seeker even though its overall score is low. The persona conditions whether a brand's positioning is read as performance, prestige, or neither.

Finding 5: Time-Series Movement

Brand-level AI visibility moves at quarterly speed inside ChatGPT. 20-point composite shifts in 60 to 110 days are visible in this dataset and exceed the sensitivity of standard brand-health trackers. All values are in AiRR Score points on the 0 to 100 scale, measured from the first day of measurement in this dataset to the most recent day.

Brand · Industry	Change	Days
Puma · Athletic Apparel	-21.94	104
Bianchi · Cycling	-19.66	64
Giant · Cycling	-14.27	64
JetBlue · Airlines	-13.15	69
Accenture · Professional Services	+14.78	55
Felt · Cycling	+11.48	64

A 20-point composite move on a 0 to 100 scale within 60 to 110 days is a structural shift. Public-market analysts following brands like Puma can expect AI visibility to lead reported brand-health metrics by one to two quarters.

Section IV

The Future of Measurement: Brand × Prompt × Persona

The first generation of AI visibility tools reports a single visibility score per brand. The data in this paper shows that number is an average across audiences the tool did not see. A brand can win the overall score and lose every persona that buys.

The second generation adds prompts. Tools now report visibility at the prompt level. That is closer to useful but incomplete. The same prompt produces different brand recommendations under different personas. Prompt-level visibility without persona conditioning is the same averaging problem at smaller granularity.

The future of measurement is the triple: brand × prompt × persona. The specific brand, the specific prompt, and the specific persona of the buyer asking. Anything less reports the average, and the PAD documented in this paper reaches 40.40 points in a single persona shift.

The CMO Paradox

A Chief Marketing Officer running their own search on their own ChatGPT account is not seeing what their customers see. The ChatGPT account that returns a high ranking for the CMO's brand is automatically baking in the CMO's persona: their job history, prior searches, location, and inferred preferences. The model could be telling the CMO what the model thinks the CMO wants to hear.

Consider a concrete case from the data. McKinsey holds an overall AiRR Score of 78.90 in professional services, ranked #2 of 9 firms. Its CMO opens ChatGPT, runs the category prompt, sees McKinsey near the top, and concludes McKinsey is competitive. McKinsey's ICP for its enterprise innovation practice is a Big 4 partner commercializing frontier tech, asking the question from inside their own firm. That persona scores McKinsey at 51.17, ranked #6. The PAD is -27.73 points (-35.1%). ChatGPT typically returns three to five brands per recommendation answer. The actual buyer sees four brands. McKinsey is not one of them. The CMO believes McKinsey is winning. The customer never sees the brand.

McKinsey is getting zero AI-influenced revenue from its target buyer while the marketing team is reporting #2 in the category.

Section V

Implications

For marketers. The aggregate AiRR Score is a starting point, not an endpoint. Persona-level breakdowns reveal a different ChatGPT ranking for every customer segment a brand serves. The marketing team should read the PAD column for the persona that matches their declared ICP and treat that value as the actionable number. A negative PAD on the ICP persona means the brand is being oversold by overall visibility and the marketing team is operating on false confidence.

PAD is also a competitive metric. A brand can read its own PAD on its declared ICP alongside competitors' PAD on the same persona. A brand winning the ICP with +20 PAD while the top three competitors are all negative is the cleanest competitive signal the AI search layer produces.

For investors and analysts. Brand visibility decay inside AI search appears to run faster than traditional brand-health indicators detect. Puma's 22-point decline over 104 days, the steepest in the dataset, is the kind of signal that public-market analysts will eventually price in.

For the GEO and AEO category. Tools that report a single brand-level visibility score are averaging across personas they do not measure. The fix is not abandoning measurement. It is measuring the structural variables (persona, prompt context, and time) that operators need to act on.

Quantified economic exposure. ChatGPT now names only three to four brands per answer, so inclusion is close to winner-take-all and the distance between being named and being left out is the entire value of that segment. A brand holding a strong position in the aggregate while sitting near the bottom in the persona where its actual customers live is losing that segment to whichever competitors the model names instead. For an eight-segment business with persona-reversal exposure across half its segments, the unrecognized revenue at risk is substantial.

Section VI

The AiRR Persona Reversal Index

The data in this paper is the inaugural release of an ongoing publication. The AiRR Persona Reversal Index, derived from the methodology in Section II, will be updated on a recurring basis as the dataset expands across additional models, geographies, and industries. The index is intended for citation, reuse, and reproduction by independent journalists, analysts, and academic researchers. Methodology disclosures, raw data samples, and column definitions are available on request from steven@airrscore.com.

Conclusion · Steven Perlman, Founder & CEO, AI Reach Rank Inc.

The most-cited measurement in marketing is about to change

For two decades the dominant question of brand search ranking was showing up on the first page of Google on a list of ten blue links. Marketing teams optimized against it, paid for it, and reported it to their boards. The 0 to 100 AiRR Score, as a standalone aggregate, is the AI-search equivalent of that ranking. It is useful as a summary. It is dangerous as a strategy.

The data in this paper documents Persona AiRR Drift of up to 40.40 points within a single brand when the audience persona shifts. Leaderboards invert at the top in professional services. Lifestyle athletic brands collapse with informed buyers. Cycling brands lose ground when the audience becomes a prestige buyer. A single overall AiRR Score reports the average of audiences the brand does not see, and the PAD can be off by more than 40 points on a 0 to 100 scale.

The unit of analysis must change. AI search recommendation is not a brand property. It is a property of the brand, the prompt, and the audience persona. The three together. Persona AiRR Drift is the measurement that makes the persona layer visible to marketing teams who would otherwise be operating on false confidence.

For independent measurement to mean anything, it has to be willing to publish what brands do not want to see. KPMG winning a buyer that the marketing team at McKinsey does not yet know it has lost. Frontier emerging in a persona where the overall score made it invisible. Puma collapsing while its lifestyle marketing continues. These observations are uncomfortable for the brands named. They are also where the value of the measurement lives.

What's your AiRR score?

Know your persona-conditioned score

Calculate your actual PAD

The AiRR Score platform tells you your position on every ICP that buys your product. Know your S. Know your PAD. Calculate your real number.

Book a Demo See how AiRR compares →

Cite this research

APA citation

Perlman, S. (2026). The Persona Reversal: Why AI Brand Visibility Is Not About Your Brand. AiRR Research Series. AI Reach Rank Inc. https://doi.org/10.5281/zenodo.20288041

The Prompt Gap

One brand, many positions

READ →

February 2026

The Super Bowl AI Effect

Why $8 million in ad spend moves nothing in ChatGPT

READ →

Get the next paper first

New research publishes quarterly. One email per paper, no marketing noise.

The Problem: A Real Baseline for AI Search Visibility Does Not Exist at the Consumer Level

AI search recommendation is not deterministic

Personalization is happening to every user, on every query

The persona ceiling: the deepest problem

Why most measurement tools cannot see this

Methodology

The AiRR Score and the 4Ps

Persona AiRR Drift (PAD)

Findings

Finding 1: Airlines, the Cleanest Persona Reversal

Frontier rises from invisible to rank 2

Delta wins, Southwest collapses

Finding 2: Professional Services, the KPMG Effect

McKinsey collapses. KPMG wins.

Finding 3: Athletic Apparel, the Informed-Audience Penalty

Reebok and Puma collapse. Asics gains.

Every brand drops except Puma

Finding 4: Cycling, Expert Audiences Reshape the Field

Finding 5: Time-Series Movement

The Future of Measurement: Brand × Prompt × Persona

The CMO Paradox

Implications

The AiRR Persona Reversal Index

The most-cited measurement in marketing is about to change

Calculate your actual PAD

The Prompt Gap

The Super Bowl AI Effect

Get the next paper first

Sources