The short answer
AI crawlability is the ability of AI search engines (ChatGPT, Perplexity, Google AI Overviews, Gemini, Bing Copilot, Google AI Mode, Claude, Grok, Meta AI, and DeepSeek) to technically access, read, and process your website's content. It is distinct from traditional SEO crawlability and depends on four core technical factors: (1) crawler access permissions, (2) server-side rendering, (3) schema markup quality, and (4) site architecture. A fifth factor, llms.txt, is an emerging standard worth implementing but has no measured impact on AI citations today. A failure in any of the four core factors can make your content invisible to AI, regardless of its quality or Google ranking.
The scale of the problem is significant. Roughly 21% of the top 1,000 websites now block GPTBot in their robots.txt, the file that tells crawlers which pages they're allowed to visit, often without knowing it. Six of eight major AI crawlers cannot execute JavaScript, meaning every single-page application built on React, Vue, or Angular may appear as a blank page to these systems. And sites that implement schema markup but leave it generic actually perform worse in AI citations (41.6%) than sites with no schema at all (59.8%).
Your site can have excellent content, strong Google rankings, and a recognized brand and still be completely invisible to AI search. The cause is almost always technical. Most businesses don't know they have these problems because they've never checked.
This article covers the technical access layer of GEO, one of five dimensions in the Aivarize GEO Scoring Index. Content quality and brand authority matter too, but they're irrelevant if AI can't reach your site in the first place.
Is your site blocking AI crawlers?
AI search platforms send their own crawlers to read your website, separate from Googlebot. ChatGPT sends GPTBot and OAI-SearchBot. Anthropic sends ClaudeBot and Claude-SearchBot. Perplexity sends PerplexityBot. Each one checks your robots.txt file for permission before reading your content.
Here's the problem: approximately 21% of the top 1,000 websites now include GPTBot rules in their robots.txt, and 79% of top news sites block at least one AI training bot. Many businesses blocked these crawlers without realizing it, often through overly broad robots.txt rules, security plugins that block unknown bots, or hosting configurations that default to restrictive settings.
If GPTBot and OAI-SearchBot are both blocked, your content is far less likely to appear in ChatGPT responses. If PerplexityBot is blocked, Perplexity is unlikely to cite you. Blocking doesn't always prevent visibility entirely. As we'll cover below, indirect pathways exist. But it removes the primary route AI crawlers use to find and cite your content.
Aivarize's AI Crawler Tier Model
Not all AI crawlers serve the same purpose. Aivarize's AI Crawler Tier Model classifies them into three strategic tiers:
Training crawlers collect data to train AI models. GPTBot, ClaudeBot, CCBot, and Google-Extended fall into this category. Blocking these prevents your content from entering future model training, but has zero retroactive effect on models already trained. Content used during training influences the model's learned representations and may persist indefinitely.
Retrieval/search crawlers build the real-time indexes AI uses to answer questions. OAI-SearchBot (ChatGPT search), Claude-SearchBot, PerplexityBot, and Bingbot serve this function. Blocking these has immediate impact on whether you appear in AI responses.
User-initiated crawlers fetch pages when a user explicitly asks the AI to browse a URL. ChatGPT-User and Perplexity-User fall here.
The following table summarizes the major AI crawlers, their function, and key technical behavior as of March 2026:
| Crawler | Operator | Tier | Renders JS? | Respects robots.txt? |
|---|---|---|---|---|
| GPTBot | OpenAI | Training | No | Yes |
| OAI-SearchBot | OpenAI | Retrieval | No | Yes |
| ChatGPT-User | OpenAI | User-initiated | No | Partially (~42% bypass) |
| ClaudeBot | Anthropic | Training | No | Yes |
| Claude-SearchBot | Anthropic | Retrieval | No | Yes |
| PerplexityBot | Perplexity | Retrieval | No | Yes |
| Googlebot | Retrieval (AI Overviews) | Yes | Yes | |
| Bingbot | Microsoft | Retrieval (ChatGPT backend) | Partial | Yes |
OpenAI reclassified ChatGPT-User outside robots.txt governance in December 2025. This means ChatGPT-User no longer fully complies with robots.txt blocking directives. TollBit documented a 42% bypass rate. For publishers relying on robots.txt as their primary AI access control, this reclassification effectively undermined the mechanism.
The strategic decision is not all-or-nothing. You can allow retrieval crawlers that drive citations while blocking training-only crawlers. Aivarize's technical audits include a 12-crawler access map covering all three tiers.
Does blocking AI crawlers actually work?
Even when you block crawlers, indirect pathways exist. Research from Columbia's Tow Center found that publishers blocking AI crawlers continue to appear in AI responses through search index intermediaries, content syndication, and persistent training data.
TollBit has documented bypass rates rising from 3.3% in late 2024 to ~13% by Q1 2025, and reaching 30% by late 2025, with ChatGPT-User at a 42% bypass rate.
The implication: blocking is not a reliable gatekeeping mechanism. It's a signal of intent, not a guarantee of exclusion. For most businesses, the better strategy is selective access rather than blanket blocking.
Is your content trapped in JavaScript?
This is the most common technical cause of AI invisibility, and it catches many modern websites.
AI crawlers don't execute JavaScript. When a crawler visits your site, it reads the raw HTML the server sends, before any React components render, before any client-side code runs, before any dynamic content loads. In 2025, Vercel analyzed hundreds of millions of GPTBot page fetches and found that none of the major AI crawlers tested, including GPTBot, ClaudeBot, AppleBot, and PerplexityBot, render JavaScript at all, with zero execution observed. The exceptions are Googlebot (and by extension Gemini), which inherits Google's rendering infrastructure, and Bingbot, which has partial JavaScript rendering capability.
If your site is a single-page application (SPA) built with React, Vue, Angular, or similar frameworks, the crawler sees an empty shell instead of your content.
You can test this yourself. View the source code of your page (right-click, then View Page Source in any browser). If you see your actual content in the HTML, headings, paragraphs, and text, your site is likely server-rendered and AI crawlers can read it. If you see mostly JavaScript bundles and an empty <div id="root">, your content is client-rendered and AI crawlers are seeing a blank page.
The solution is server-side rendering (SSR) or static site generation (SSG). Frameworks like Next.js, Nuxt, and SvelteKit support both approaches. Pre-rendering services can generate static HTML for sites that can't easily migrate. This isn't just a GEO issue; it affects traditional SEO too. But the impact on AI citation is more severe because AI crawlers have zero tolerance for empty pages, unlike Googlebot, which has learned to handle JavaScript over the years.
Does your site have an llms.txt file?
llms.txt is a proposed protocol that gives AI systems a structured overview of your website, though as of early 2026, no major AI crawler has been observed actually requesting it. It sits at your site root (like robots.txt) and tells AI what your site is about, what your most important pages are, and how your content is organized. Think of it as a cover letter for your website, designed for AI to read before deciding whether to dig deeper.
Adoption remains very low. Among the top 1,000 websites, implementation is effectively zero. Across broader datasets, SE Ranking found about 10% of ~300,000 domains had one, still a small minority. That's both a problem and an opportunity.
The file follows a simple format: a title, a description, and categorized links to your key pages with brief descriptions of each.
Aivarize's position on llms.txt is nuanced. As of early 2026, no major AI crawler has been observed actually requesting llms.txt files in server logs. The protocol remains aspirational rather than functional.
But early adoption costs almost nothing, positions you ahead of the curve, and Google's involvement in the IETF AIPREF standard, a separate but thematically related initiative for expressing AI usage preferences, suggests that formal standards for AI-content interactions are moving forward, even if llms.txt itself hasn't gained platform adoption. Aivarize includes llms.txt assessment in its technical audits because the effort-to-potential-upside ratio is favorable.
Find out where your business stands. We score brands across all five GEO dimensions and show you exactly where you're visible, where you're not, and what to do about it. Get your GEO visibility score →
Does your site speak the language AI understands?
Schema markup, structured data in JSON-LD format embedded in your pages, tells AI what your content means, not just what it says. It's the difference between an AI reading a page and guessing that it's about a software company, versus the AI knowing with certainty that it's about a software company based in Finland that offers specific services.
Pages with structured data appear more frequently in AI citations: approximately 65% of pages cited by Google's AI Mode and 71% cited by ChatGPT include schema markup. These are correlations, not proof that schema causes citation. In fact, the white paper notes that schema's independent citation impact is statistically insignificant (OR=0.678, p=.296) — the benefit likely comes from schema making content clearer and more parseable, not from a direct causal effect.
However, no AI system currently parses JSON-LD semantically at the generation layer. They read it as text, not as structured data. The benefit comes from schema making your content clearer and more parseable as text, not from AI understanding the schema format itself.
Generic schema harms citation rates
Sites with boilerplate, unfilled schema showed lower citation rates (41.6%) than sites with no schema at all (59.8%). Only attribute-rich, properly completed schema improved citation odds (61.7%), according to Growth Marshal research. The takeaway: if you're going to implement schema, do it properly or don't do it at all.
The most impactful schema types for GEO are Article, FAQPage, HowTo, Product, Recipe, and VideoObject. Generic types like Organization, WebSite, WebPage, Corporation, and Person score below "no schema" unless they are attribute-rich. Beyond page-level schema, your connection to Wikidata and Google's Knowledge Graph matters. These are the structured databases AI platforms use to verify entity information.
In Aivarize's audits, a frequent pattern is sites with generic Organization schema that include only a name and URL, missing description, founding date, service offerings, and social profiles. That kind of bare-minimum implementation doesn't help. It actively hurts.
Site architecture for AI crawlers
Site architecture affects how efficiently AI crawlers discover and prioritize your pages, even after the major access issues are resolved. In the Aivarize GEO Scoring Index, these factors span two dimensions: AI Discoverability (SSR, sitemaps) and Technical Foundation (internal linking, page speed).
| Factor | Dimension | What it affects | What to do |
|---|---|---|---|
| Server-side rendering | AI Discoverability | Everything. If your site renders content client-side, none of your other technical optimizations matter. | SSR is the prerequisite. Confirm it's in place before optimizing anything else. |
| XML sitemaps | AI Discoverability | Crawl efficiency. A comprehensive, up-to-date sitemap helps AI crawlers find your content without relying on link-following alone. | Basic SEO hygiene, but it directly benefits AI discoverability too. Keep it current. |
| Internal linking depth | Technical Foundation | Discoverability. Pages buried 3+ clicks from the homepage get crawled significantly less. Orphan pages with no internal links pointing to them are very unlikely to be discovered by AI crawlers. | Maintain a flat, well-linked site structure. Ensure every important page is reachable within 2-3 clicks. |
| Page speed | Technical Foundation | Crawl prioritization. AI crawlers process millions of pages; slow-responding servers get deprioritized. Industry analyses have consistently correlated faster page loads with higher citation rates. | Optimize server response times, particularly Time to First Byte (TTFB). |
How do you check where you stand?
Most businesses have never tested their AI accessibility. They assume good Google rankings translate to AI citations. That assumption is often wrong. As we detail in our main GEO guide, the majority of Google's top links are not cited by AI platforms.
Start with these five manual checks:
- Test your robots.txt. Visit
yourdomain.com/robots.txtand search for GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot. If any are disallowed, your content may be invisible to that platform. - View your page source. Right-click any key page, select "View Page Source," and look for your actual content in the HTML. If you see mostly JavaScript bundles and an empty
<div id="root">, AI crawlers are seeing a blank page. - Check for JSON-LD schema. Search your page source for
application/ld+json. If it's missing, or if the schema contains only a name and URL with no other attributes, you have a schema gap. - Look for llms.txt. Visit
yourdomain.com/llms.txt. If you get a 404, you don't have one. That's typical, but it's a missed opportunity. - Query AI engines about your brand. Ask ChatGPT and Perplexity a question your site should answer. If they don't cite you, something in the technical chain is broken.
But this article has covered five distinct technical failure points, and each one can independently make you invisible. A manual check might catch one or two. A complete picture requires testing all of them together — which is what a GEO audit provides.
Frequently asked questions
Why isn't my site showing up in ChatGPT or Perplexity? The most common causes are technical, not content-related. Your robots.txt may be blocking AI crawlers, your site may rely on client-side JavaScript that AI crawlers can't render, or you may lack the structured data signals AI platforms use to understand and trust your content. Check all five technical factors covered in this article before assuming the problem is your content.
Do AI crawlers render JavaScript? Six of eight major AI crawlers do not execute JavaScript as of March 2026, including GPTBot, ClaudeBot, PerplexityBot, and AppleBot. The exceptions are Googlebot, which powers Google's AI Overviews, and Bingbot, which has partial JavaScript rendering capability. If your site is a single-page application, most AI crawlers see only the raw HTML your server sends, which is often an empty shell.
Does blocking GPTBot stop my content from appearing in ChatGPT? Not entirely. Blocking reduces your visibility significantly but may not eliminate it. ChatGPT relies on Bing's index as a backend, so content indexed by Bingbot can still surface. ChatGPT-User also bypasses robots.txt directives at a documented 42% rate. Blocking is a signal of intent, not a guarantee of exclusion.
Find out where your business stands. We score brands across all five GEO dimensions and show you exactly where you're visible, where you're not, and what to do about it. Get your GEO visibility score →
Further reading
- GEO Explained: How AI Search Decides Which Businesses to Cite, and Which to Ignore →, Covers all five pillars of Generative Engine Optimization, from content quality to brand authority to the technical access layer discussed here
- How AI Search Engines Choose What to Cite →, The ranking and retrieval logic behind AI citation decisions, including how vector similarity, entity signals, and freshness interact
- How to Write Content That AI Will Actually Cite →, The content quality layer of GEO: definitional authority, answer capsules, evidence integration, and structural patterns that maximize AI extractability
- The Zero-Click Problem →, What happens to your traffic when AI provides the answer directly, and how to position your brand within those responses
About the author: Fadi El Chami is the founder of Aivarize, a GEO consultancy that helps businesses become visible and citable in AI-powered search. He developed the Aivarize GEO Scoring Index, a five-dimension framework for measuring and improving how AI platforms discover, evaluate, and cite business content, and brings over 20 years of B2B sales and strategy experience to the emerging challenge of AI visibility.