Which AI bots should a clinic site allow versus block?

Allow the bots that send users back to your site as a citation: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, Applebot, Applebot-Extended, Bytespider, Meta-ExternalAgent, Amazonbot. Block CCBot — it feeds Common Crawl, a training corpus with no citation pathway, so allowing it costs bandwidth without earning visibility.

What is /llms.txt and why does a vet clinic need one?

It's a flat-file summary of the site at the root path, written for language models rather than humans. The proposed standard at llmstxt.org gives engines a low-noise version of your business — location, hours, team, plans, key URLs, disclaimer. When a model is grounded against it, you control the framing. For vet clinics specifically, pricing is the most-asked question across AI queries, so we pair it with a /pricing.md file in machine-readable Markdown.

Why use a server-side OG registry instead of query-string OG images?

Query-string OG endpoints accept attacker-controlled text on your real brand URL. A server-side registry resolves the title and category from the request path — nothing about the rendered card comes from the query string, so the endpoint cannot be coerced into rendering arbitrary text on a sixteenmilevet.com URL.

Making a vet clinic citable by ChatGPT, Perplexity, and Claude

We shipped a coordinated AI-EO + technical SEO pass on sixteenmilevet.com: allow citation-pathway AI bots and block training-only ones, expose /llms.txt and /pricing.md at the root, harden authorship and reviewer JSON-LD, replace the hand-rolled sitemap with one derived from the router, and lift Core Web Vitals on the homepage. The work shipped today; we’ll update this post with measured results in a few weeks.

Sixteen Mile Veterinary Clinic is a single-location practice in Oakville. The site runs on Astro 5 (server output), React 19, and Tailwind 4, deployed to Vercel. The brief was simple: make the site eligible for citation by AI answer engines, tighten the technical-SEO surface, and lift Core Web Vitals on the homepage.

The argument behind the work is the order. There is no point optimising for engines that can’t reach you. There is no point earning crawl access if your authorship layer is so vague Google can’t tell who wrote the article. There is no point fixing schema if the canonical host disagrees between tags. Performance is the most familiar lever, and the smallest one when the structured-data layer is already broken.

So: crawler policy first. Authorship next. Schema and sitemap hygiene third. Performance last.

1. Crawler policy: opt in to citation, opt out of training

We rewrote robots.txt around a simple distinction. Does this bot send users to the source, or does it just train a model?

Allowed — GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended, Applebot, Applebot-Extended, Bytespider, Meta-ExternalAgent, Amazonbot.
Blocked — CCBot. It feeds Common Crawl, a training corpus with no citation pathway. Allowing it costs bandwidth without earning visibility.
Crawl-delay — SemrushBot, AhrefsBot, DotBot. Useful tools, kept off the critical path.

This is the cheapest AI-EO change available and the one most clinic sites get wrong. They either block everything or allow everything.

2. `/llms.txt` and `/pricing.md`

Two flat files at the root, both linked from the sitemap.

/llms.txt follows the proposed standard for giving language models a clean, low-noise summary of the site. Ours covers location, hours, team, plan structure, key URLs, and the disclaimer. When a model is grounded against this URL, we control the framing.

/pricing.md is wellness-plan pricing in machine-readable Markdown — SMVC Club at $40/month, the P.A.L. Plan, exam fee, plan rules. Pricing is the single most-asked question across vet AI queries. Serving it as plain text lets answer engines quote it accurately rather than hallucinate.

3. Authorship and E-E-A-T

Google’s vet-content guidance is strict, and answer engines are converging on the same signals.

Per-post author registry. Posts written by credentialed staff (Dr. ... DVM) emit Person JSON-LD with the clinic as affiliation. Non-credentialed authors fall back to Organization.
Reviewer-aware authorship. Many of the 92 educational posts were drafted by editorial and reviewed by a clinician. Schema now enforces that every non-draft post has either an author or a reviews array. Reviewer-only posts render “Reviewed by Dr. …” and emit reviewedBy Person entries alongside the clinic as author.
Disclaimer. A /disclaimer page plus a per-post editorial disclaimer aside, linked from the footer. Tells humans and crawlers that articles are educational, not a substitute for an exam.

4. Structured-data hygiene

Small fixes, outsized effect on how Google and answer engines reconcile entities.

We canonicalised every JSON-LD URL to https://www.sixteenmilevet.com so BlogPosting, the sitemap, and <link rel="canonical"> agree. Mismatched hosts (apex vs. www) silently demote rich-result eligibility.

We removed a stale specialOpeningHoursSpecification for Canada Day 2025 that was still being emitted in 2026. Schema lying about hours is worse than no schema.

We added optional faq frontmatter on blog posts that emits FAQPage JSON-LD. The hardcoded “Sarah Bishop welcome” FAQ block was the first consumer.

We replaced ad-hoc breadcrumb components with BreadcrumbList JSON-LD on detail pages, plus a real <nav aria-label="Breadcrumb"> in the markup. A dead BreadcrumbsContentPages component that was imported but never rendered got deleted on the way through.

5. One sitemap, derived from the router

The previous setup had sitemap-index.xml.ts + sitemap-0.xml.ts plus a hardcoded list of blog slugs. We replaced both with a single sitemap.xml.ts driven by the router:

const decisions: Record<keyof typeof routes, SitemapDecision> = { ... }

Adding a route without making a sitemap decision is now a TypeScript error. Blog posts come from the content collection, so new articles appear automatically. The new /blog/<n> and /blog/topic/<slug>[/<n>] paths are included.

Astro footnote: pagination pages originally used getStaticPaths, which never runs under output: "server". We swapped to request-time Astro.params.page parsing with a redirect-to-page-1 fallback for out-of-range values.

6. Topic-based blog archive

The blog grew to 92 posts across ticks, heartworm, fleas, dog allergies, safe foods, and clinic updates. We replaced the flat archive with:

/blog, paginated 12 per page, with a topic-chip filter row.
/blog/topic/<slug> archives for each topic, also paginated.
A single TOPICS registry (src/lib/blog/topics.ts) feeding archive pages, post breadcrumbs, the sitemap, and the footer column. One file to update; everything else stays in sync.
Featured-post hero on /blog; “Continue Reading” on each post with sessionStorage-backed visited-state highlighting.

The topic archives matter for AI-EO specifically. They give answer engines clean topical hubs to cite when a user asks a category-shaped question (“tick prevention in Ontario”) rather than a long-tail one.

7. Dynamic Open Graph images

Every page now has its own 1200×630 OG card, generated on the fly via @vercel/og:

The /og/<path>.png endpoint resolves the title and category label from a server-side registry keyed by path. Nothing about the rendered card comes from the query string. The endpoint cannot be coerced into rendering attacker-controlled text on a sixteenmilevet.com URL.
SEOHead falls through: explicit image prop → registered dynamic OG → static /ogimage.png.
Vercel includeFiles bundles the brand fonts and white-logo SVG into the serverless function so the renderer is self-contained.

Every share, every AI-engine link preview, every Google og:image lookup gets a branded card without us hand-designing one per page.

8. Core Web Vitals on the homepage

LCP and CLS feed into both classical SEO and the freshness of any AI summary that re-fetches the page.

We preloaded Lexend Deca and Open Sans Latin woff2 subsets in the base Layout so above-the-fold text never blocks on font fetch.
We preloaded the homepage hero image with fetchpriority="high". Below-the-fold <img> tags get loading="lazy" + decoding="async".
We converted the Get-to-Know-Us carousel’s first slide from a CSS background-image to a real <img> so it can actually be marked high-priority and sync-decoded.
We replaced the hero’s .jpg with a hand-tuned .webp (101 KB) for an immediate byte-size win.

What we’re tracking

The work shipped today (2026-05-09). Results take weeks. We’ll come back and replace this section with a measured update; the watchlist:

Citation rate in AI answer engines. Tracked through ChatGPT search, Perplexity, and Claude search for vet-pricing and topic-shaped queries. We expect /pricing.md and the topic archives to surface first.
Rich-result eligibility in Google Search Console — BlogPosting, FAQPage, BreadcrumbList, LocalBusiness. The Canada-Day-2025 stale-hours fix should clear an existing warning.
Indexed-page count. With the router-driven sitemap, the new topic archives and pagination pages should show up in coverage reports within the first crawl cycle.
Core Web Vitals on the homepage. LCP is the one to watch; the hero .webp and font preload should move it.

Why this order matters

Most of these changes are individually small. The win is doing them together, in this order:

Crawler policy first. No point optimising for engines that can’t reach you.
Authorship and disclaimer next. Vet content is YMYL, and both Google and answer engines lean on identifying who wrote and reviewed each article.
Schema and sitemap hygiene third. Once the entity model is consistent, every other signal compounds.
Performance last. Most familiar lever; smallest one when the structured-data layer is already broken.

Sources

Sixteen Mile Veterinary Clinic — sixteenmilevet.com
llms.txt — proposed standard for site summaries written for language models
@vercel/og — Open Graph image generation on Vercel
Astro — getStaticPaths reference

Making a vet clinic citable by ChatGPT, Perplexity, and Claude

1. Crawler policy: opt in to citation, opt out of training

2. `/llms.txt` and `/pricing.md`

3. Authorship and E-E-A-T

4. Structured-data hygiene

5. One sitemap, derived from the router

6. Topic-based blog archive

7. Dynamic Open Graph images

8. Core Web Vitals on the homepage

What we’re tracking

Why this order matters

Sources

Keep reading

Claude Design produces AI slop unless you tell it not to

A working playbook for Claude Code Skills on Opus 4.7

Two Google image models, two jobs: a working prompt guide for Nano Banana Pro and Nano Banana 2

Brand-fidelity mockups in Claude Code and Google Stitch: what actually steers them off the AI default

How to get Claude Opus 4.7 to write copy that doesn't sound like AI

Google's 2025 HCP targeting changes, read for Canadian pharma