GEO vocabulary fragmented along the same lines early SEO did — vendors coining new terms to brand their tooling, agencies re-naming the same primitives, and journalists reaching for whichever phrase landed first. The result: every conversation about AI-search visibility spends the first ten minutes translating words.

This glossary holds 150 terms across six domains: AI-search platforms and surfaces, citation and visibility metrics, content signals (llms.txt, AGENTS.md, schema), retrieval and embedding mechanics, audit and measurement methodology, and the operational stack. Each term has a plain-language definition, a worked example, and a citation to the canonical paper or vendor doc.

Built from the Princeton GEO paper (Aggarwal et al., 2024), Profound and AthenaHQ research, Authoritas and Sistrix tracking, Bain measurement work, and the active vocabulary on SearchEngineLand, Search Engine Journal, and Aleyda Solis's framework essays. Treat this as the reference; read alongside our agentic SEO service docs.

Key takeaways

01
GEO replaces SEO's blue-link-rank with a citation-share metric. The vocabulary follows.Where SEO measured position-1 visibility on a SERP, GEO measures whether your content is cited inside an AI answer. That single shift renames most of the working vocabulary.
02
Six domains cover ~95% of working GEO vocabulary: platforms, metrics, signals, retrieval, audit, stack.Vendor-published glossaries usually only cover two of the six. Cross-functional teams (content + technical SEO + analytics) need all six to communicate without translation steps.
03
llms.txt and AGENTS.md are the two AI-readable content-signal files that matter; the rest are vendor experiments.Both stabilized in 2025-2026 and now have working-group support. Other proposed signals (ai-content.txt, robots-ai.txt) have not converged.
04
Citation share, not visibility, is the right north-star metric.Visibility (does my brand appear?) is necessary but not sufficient. Citation share (what % of relevant AI answers cite me?) ties directly to traffic and brand authority.
05
Audit methodology is where vendors disagree most. Lock the methodology in any vendor contract.Profound, AthenaHQ, Authoritas, and Sistrix each measure citation differently — query set, model panel, frequency, geo-mix. Apples-to-apples requires methodology lock-in.

01 — Domain 01AI-search platforms & surfaces.

Where AI answers actually render. Each surface has a different citation model, ranking surface, and traffic value. Naming them precisely matters because each requires distinct optimization.

Generative Engine. Aggarwal et al. (2024) term for an AI system that synthesizes an answer from retrieved sources rather than serving ranked links. Examples: Google AI Overviews, Perplexity, ChatGPT Search, You.com, Microsoft Copilot.

AI Overview (AIO). Google's generative answer surface above the organic results on eligible queries. Powered by Gemini; cites a panel of sources. Rolled out broadly in May 2024; remains the largest GEO surface by user-volume.

AI Mode. Google's separate conversational interface (rolled out 2026). Distinct from AIO — AI Mode is its own destination; AIO is an embedded snippet on the SERP. Citation patterns differ.

Perplexity. AI-search-native consumer engine. Surfaces an answer with inline citations and a sources panel. Higher ratio of click-through to cited sources than AIO.

ChatGPT Search. OpenAI's web-grounded answer mode inside ChatGPT. Cites sources via a separate panel. Distinct from ChatGPT's native (non-search-grounded) responses.

SearchGPT. Earlier name for ChatGPT Search; deprecated in favor of "ChatGPT Search" in late 2024-2025.

Copilot Answers. Microsoft's AI-search surface in Bing and Edge. Powered by GPT-4-class models with Bing index grounding. Smaller traffic base than AIO; higher per-query value for B2B verticals.

You.com. AI-search engine with multi-mode answer surfaces (research, code, art). Distinct citation model — cites both web and structured sources.

Claude Search. Anthropic's web-grounded answer mode in Claude (rolled out late 2025). Grounded in a curated index with citation discipline.

Gemini Deep Research. Google's multi-step research agent that crawls, synthesizes, and reports. Distinct from AIO — much longer-form output, broader citation panel.

Citation panel. The list of sources displayed with an AI answer. Distinct from inline citations — the panel shows all sources used, while inline citations attribute specific claims.

Inline citation. A footnote-style link inside the generated answer text, attributing a specific claim to a specific source. Perplexity, ChatGPT Search, and Claude Search use inline citations; AIO mostly uses panels.

Source panel. A side-rail or expandable section showing the sources behind an AI answer. Standard pattern across most GEO surfaces.

Surface 1

AI Overview (AIO)

Embedded snippet · panel citations

Largest GEO surface by user volume. Citation panel of 3-8 sources. Optimization: structured content, claim-density, schema markup.

Google · Gemini

Surface 2

Perplexity

Standalone destination · inline citations

Highest CTR-to-cited-source ratio. Optimization: citation-friendly format, fact-density, link-out-worthy depth.

Perplexity AI

Surface 3

ChatGPT Search

Embedded in ChatGPT · panel citations

Largest LLM user base. Optimization: long-form authority, brand-name presence, structured data.

OpenAI

Surface 4

AI Mode

Conversational destination · panel + inline

Newer Google surface (2026). Mixed citation model. Optimization patterns still emerging; treat as research-frontier territory.

Google

02 — Domain 02Citation & visibility metrics.

How AI-search visibility is actually measured. The vocabulary here diverges most across vendors — Profound, AthenaHQ, Authoritas, and Sistrix each measure citation differently. Locking these terms in contracts is the difference between comparable reports and apples-to-oranges arguments.

Citation share. The percentage of relevant AI answers that cite your domain. Computed as (citations / total relevant queries × model panel size). The closest GEO equivalent of share-of-voice in PR.

AI visibility. Whether your brand appears at all in AI answers across a defined query set, regardless of citation. Necessary precondition for citation share.

Brand mention. An unlinked reference to your brand inside an AI answer. Distinct from citation. Brand mentions without citations indicate model-recall (your brand is in training data) but no source-attribution.

Citation rate. Citations per query. Different from citation share — citation rate counts your citations; citation share normalizes by total citation slots available.

Position in citation panel. Where your domain ranks within the cited-source list. Position 1 captures disproportionate click-through; positions 4+ near zero.

AIO trigger rate. The fraction of queries in your tracked set that trigger an AI Overview at all. Tracked separately from citation rate because AIO trigger is binary (rendered or not) and varies by query type.

Query coverage. The size and shape of the query set used to measure citation share. Vendors with bigger panels (10K+ queries across geographies) report more stable numbers.

Model panel. The set of AI engines tracked in a measurement system. Profound covers ~6; AthenaHQ ~5; Authoritas focuses on AIO. Comparable reports require comparable panels.

Geo-mix. The geographic distribution of queries in a measurement panel. Citation patterns differ markedly by country — US/UK vendors over-index US-English by default.

Recency window. How recently the citation data was collected. AI answers shift fast — a 30-day-old citation score is decision-relevant; a 6-month-old one is historical.

Click-through to source (CTS). The fraction of users who click a cited source after seeing an AI answer. Perplexity reports CTS in the 15-25% range; AIO reports 1-3%.

Zero-click rate. The fraction of queries that result in no click after the AI answer. Long-running concern in SEO; magnified by AIO. Industry estimates put 2026 zero-click at ~60% of queries.

"Citation share is the share-of-voice metric for the AI-search era. Visibility says you exist; citation share says you matter."— Digital Applied GEO measurement framework, May 2026

03 — Domain 03Content signals (llms.txt, schema, etc.).

The machine-readable signals you place on your site to guide AI crawlers and answer engines. Two stabilized in 2025-2026; the rest are vendor experiments worth knowing about but not betting on.

llms.txt. A markdown file at the root of a domain (`/llms.txt`) that summarizes the site's content for AI crawlers. Stabilized as a community standard in 2024-2025; widely adopted in 2026 across documentation sites and content publishers.

llms-full.txt. An expanded version of llms.txt that includes the full markdown content of major pages. Larger file; trades bandwidth for index richness.

AGENTS.md. A repository or project-level file that tells AI agents how to navigate, build, and contribute to a codebase. Stabilized in 2025; adopted by Anthropic, OpenAI, and framework projects.

Schema.org markup. Structured data embedded in HTML (typically as JSON-LD) that describes page content to crawlers. The single biggest non-content signal for AI-search visibility on structured queries (products, articles, organizations).

JSON-LD. The recommended serialization format for schema.org markup. Embedded in a <script type="application/ld+json"> tag in HTML. Preferred over microdata or RDFa.

Article schema. Schema.org type for editorial content. Captures author, publish date, modified date, headline. Required signal for blog and news content in AIO.

Organization schema. Schema.org type for the entity behind the site. Captures name, logo, social links, founding date. Anchors entity-recognition for brand-name queries.

BreadcrumbList schema. Hierarchical navigation structure for the page. Helps AI crawlers map content architecture.

FAQ schema. Schema.org type for question-answer pairs. Note: Google restricted FAQ rich results in late 2023, but FAQ schema still functions as a content signal for AI-search engines beyond Google.

Author schema (sameAs). Author entity markup with `sameAs` links to social profiles, ORCID, etc. Strengthens author-entity recognition for E-E-A-T signals in AI answers.

robots.txt directives. The classic crawler-control file extended with AI-specific user agents (GPTBot, ClaudeBot, CCBot, PerplexityBot, Google-Extended). Allow / disallow decisions here gate AI-training and AI-search crawling.

noai meta tag. An HTML meta tag (<meta name="robots" content="noai">) signaling that AI engines should not use the page for training or generation. Adoption is mixed; not all engines respect it.

ai-content.txt (proposed). A proposed sibling to llms.txt that signals AI-content licensing terms. Has not yet converged across crawlers; treat as experimental.

The two signals worth investing in

As of Q2 2026, only llms.txt and schema.org JSON-LD have stabilized enough to bet on. AGENTS.md is critical for repos but tangential for marketing sites. Other proposed signals are worth watching but not building roadmaps around.

04 — Domain 04Retrieval & embedding mechanics.

How AI-search engines actually find your content before they cite it. The retrieval layer is upstream of citation decisions; if you don't make it through retrieval, no amount of content quality helps.

Retrieval-augmented generation (RAG). The standard architecture for AI-search: retrieve relevant documents, then generate an answer grounded in them. AIO, Perplexity, and ChatGPT Search all run RAG variants.

Embedding. A dense vector representation of text used for semantic similarity. AI-search engines embed queries and indexed content into the same vector space, then retrieve by distance.

Embedding model. The model that produces embeddings. OpenAI's text-embedding-3, Google's Gecko/Vertex, Voyage AI's voyage-3, and Cohere's embed-v3 are the main commercial ones in 2026.

Hybrid retrieval. Combining dense (embedding) and sparse (BM25, keyword) retrieval. Dominant pattern in production AI-search; balances semantic recall and lexical precision.

BM25. The classic lexical retrieval algorithm. Variant of TF-IDF; the sparse-retrieval baseline in nearly every hybrid system.

Reciprocal rank fusion (RRF). A method for combining results from multiple retrievers (dense + sparse) into a single ranking. The default fusion in most production hybrid stacks.

Reranker. A model that re-orders an initial retrieved set for relevance. Cross-encoder rerankers (Cohere Rerank, Voyage Rerank) are the standard production choice.

ColBERT. A late-interaction retrieval model that stores per-token embeddings rather than one-per-document. Better recall on long documents; higher storage cost.

Chunking. Splitting long content into smaller passages before embedding. Standard chunk sizes: 200-500 tokens with 10-20% overlap. AI-search engines chunk during their crawl; chunk boundaries affect what gets cited.

Passage retrieval. Retrieving paragraph-level passages rather than full pages. AIO and Perplexity surface passages, not URLs — which is why passage-level structure matters in your content.

Index freshness. How recently a search engine re-crawled and re-indexed your content. AI-search indices vary widely — Perplexity rolls daily; AIO can lag weeks.

Crawl frontier. The queue of URLs an AI-search crawler plans to fetch next. Sites without llms.txt or sitemap signals tend to land lower in the frontier.

Layer 1

2phases

Retrieve, then generate

Every GEO surface follows this shape. Retrieval picks documents; generation synthesizes the answer. Optimize for both phases independently.

RAG architecture

Retrieval

3modes

Dense · sparse · hybrid

Dense (embedding) for semantic match; sparse (BM25) for keyword precision; hybrid for both. Production engines run hybrid.

Retrieval taxonomy

Generation

4models

GPT · Gemini · Claude · open weights

Each major surface uses one or two of these. Citation behavior differs — Gemini cites less liberally than Claude or GPT.

Model panel

05 — Domain 05Audit & measurement methodology.

How GEO performance is actually audited. Methodology vocabulary matters because vendors disagree most here. Two reports with different methodologies are not comparable, even on identical domains.

Visibility audit. A scoped review of your brand's citation share across a defined query set, model panel, and recency window. The unit of GEO measurement.

Query set. The list of queries the audit runs. Should reflect target topic clusters, not just brand-name queries. Quality of the query set determines the quality of the findings.

Brand-name query. A query that explicitly mentions your brand. Easy mode for citation share — most engines cite the brand-named domain. Useful as a baseline, not a success metric.

Topic cluster. A group of related non-brand queries on a topic where you want to be cited. The real measurement target.

Citation pattern audit. A deeper audit that classifies what kinds of pages get cited (long-form, reference, comparison, case study). Informs content strategy.

Surface coverage. Which AI-search surfaces the audit covers. Single-surface audits (just AIO, just Perplexity) miss the broader picture; multi-surface audits cost more.

Sample size. The number of times each query is run during the audit. AI answers are stochastic — five runs per query is the minimum for stable citation patterns.

Model temperature. The randomness setting on the generation model. Affects citation stability. Audit reports should disclose temperature.

Geo segmentation. Running the same queries from multiple geographic regions. AIO citations differ markedly by country; segmentation surfaces market-specific gaps.

Competitor benchmark. Citation share for a named competitor set. The closest GEO equivalent of competitor position-tracking in classic SEO.

Methodology disclosure. The audit's published statement of query set size, model panel, run count, geo-mix, and recency. Required for any comparison across reports.

06 — Domain 06GEO operational stack.

The vendor and tooling vocabulary that names what each piece of the GEO ops stack does. Clear naming here speeds vendor evaluation and contract scoping.

Profound. Enterprise GEO measurement platform. Tracks citation share across major AI-search surfaces; offers query-cluster audits and competitor benchmarks. One of the two largest enterprise vendors as of 2026.

AthenaHQ. GEO measurement and content-strategy platform. Surfaces citation gaps and recommends content interventions.

Authoritas. SEO platform that extended into AIO tracking. Strong on AIO-specific metrics; lighter on multi-surface coverage.

Sistrix. European SEO platform; extended into AIO impact research with public industry studies.

Searchscape. Content-and-citation observability for enterprise content teams.

SE Ranking. SEO tool with AI-search visibility features; mid-market alternative to enterprise GEO platforms.

Otterly. Lightweight AI-search visibility tracker for SMB and indie brands.

SEMrush AIO tracker. SEMrush's add-on for AIO citation tracking. Convenient for teams already on SEMrush; single-surface coverage.

Ahrefs AI mention tracking. Ahrefs' equivalent module for AI-mention tracking. Same trade-offs.

GA4 AI traffic source. The Google Analytics 4 traffic-source bucket that captures referrals from AI-search destinations. Needs proper UTM tagging or referrer matching to isolate.

Server-side AI bot logging. Capturing AI crawler visits in server logs. Distinguishes between AI-training crawls (GPTBot training) and AI-search crawls (Perplexity, OpenAI search bot, Google-Extended).

llms.txt generator. Tooling that produces a valid llms.txt file from a sitemap. Several free generators exist; production sites tend to maintain llms.txt as a build artifact.

"The mistake we see most often is comparing two GEO reports without checking methodology. Two vendors, two different query sets, two different model panels — same domain, very different stories."— Internal GEO audit retrospective, March 2026

07 — ConclusionGEO vocabulary stabilizes when teams pick a glossary.

The shape of GEO vocabulary · April 2026

The category needs one canonical glossary; this is one entry in that race.

GEO is going through the same vocabulary-fragmentation phase early SEO did between 2002 and 2008 — every new vendor or agency invents a term, and every journalist reaches for whichever one landed first. The category will eventually agree on one canonical glossary; the publishers who define it will become the citation-magnet wiki entries for years.

Six domains cover ~95% of what GEO teams actually need to talk about: platforms, metrics, content signals, retrieval mechanics, audit methodology, operational stack. Domain-specific extensions (vertical playbooks, regulatory variants) add the rest.

The single most expensive vocabulary mistake we see in vendor engagements is conflating visibility with citation share. Visibility says you exist; citation share says you matter. The metric you optimize for governs the content strategy that follows.

Generative Engine Optimization Glossary: 150 terms.

01 — Domain 01AI-search platforms & surfaces.

AI Overview (AIO)

Perplexity

ChatGPT Search

AI Mode

02 — Domain 02Citation & visibility metrics.

03 — Domain 03Content signals (llms.txt, schema, etc.).

04 — Domain 04Retrieval & embedding mechanics.

Retrieve, then generate

Dense · sparse · hybrid

GPT · Gemini · Claude · open weights

05 — Domain 05Audit & measurement methodology.

06 — Domain 06GEO operational stack.

07 — ConclusionGEO vocabulary stabilizes when teams pick a glossary.

The category needs one canonical glossary; this is one entry in that race.

Stop arguing about what visibility means.

GEO operating engagements

The GEO questions we get every week.

Continue exploring AI-search references.

Why Most GEO Advice Is Wrong: A Contrarian Essay

Generative Engine Optimization: AI Search Citation Guide

Agentic Engine Optimization: Google's AEO Framework