/methodology · published reference · v3.1

The Caboo Score.

Six sub-scores, weighted by what predicts whether a buyer asks AI and ends up at your door. The full equation, signed and versioned. This document is the contract: when something Caboo says about your business is challenged, this is the page we're held to.

Signed Methodology v3.1 · published Apr 30, 2026 Caboo Methodology Committee

§ 1.Overview

What the score measures, in one paragraph.

The Caboo Score is a 0–100 number that estimates how often a buyer asking AI for what your business sells ends up reading your name. It rolls up six measurable signals into one figure. It does not measure what individual users see (they get personalized answers), or how good your business actually is (we leave that to your customers). It measures one specific layer: the recommendation surface AI assistants present when asked.

The score is comparable across businesses in the same category and region. It updates each scan; week-over-week swings of ±5 points are normal model variance and are not on you. Larger movements reflect real changes in your visibility — or your competitors'.

§ 1.1.Freshness layer

How Caboo stays current as AI surfaces change.

Caboo measures the repeatable, comparable recommendation layer across live AI search surfaces. Personalized consumer answers can differ.

Each scan is tied to a scan profile and model pack. New models first enter candidate and shadow mode, run through canary prompts, then become active only through a published methodology change. Older scans are marked fresh, warm, or cold so customers know whether a score is current or needs a retest.

Caboo measures the repeatable, comparable recommendation layer across live AI search surfaces. Personalized consumer answers can differ.

Freshness health Static baseline Live health appears when the API status endpoint is available.

Active profile 2026-05-medspa-us-local The complete methodology bundle used by new scans.

Model pack 2026-05 The exact AI surface bundle used for comparable scoring.

Crawler directive set ai-crawlers-2026-05 The robots.txt policy behind Fix Pack crawler advice.

Surface coverage Direct + OpenRouter breadth Direct surfaces expose deeper diagnostics; breadth surfaces expand coverage.

Current

Methodology v3.1 baseline

Initial published scan profile, model pack, prompt pack, and crawler directive set.

Apr 30 2026

§ 2.The equation

The full math, in one block.

Each sub-score is a 0–1 measurement, multiplied by its weight, summed, and rendered as a 0–100 score. The weights are not arbitrary; each one defends itself in the relevant section below.

Caboo Score

Visibility

Preference

Accuracy

Source Strength

Technical

Percentile

§ 3.Sub-scores in detail

Each weight defends itself.

For every sub-score, four things are documented: what we measure, how we measure it, a worked example against a sample business, and the caveats we know about. The caveats are where a lot of trust gets earned — methodology pages that hide their limits aren't being honest about them.

What we measure

How often the business appears at all — by name — when AI assistants are asked the kind of question a buyer would ask.

How we measure it

Each scan generates 10 buyer-intent prompts and runs them across 4 platforms (40 responses total). Visibility = appearances ÷ 40. Tangential mentions count; competitor-only mentions do not.

Worked example

Aura Medical Spa appeared in 11 of 40 responses. Visibility = 0.275. Weighted contribution = 0.275 × 0.35 = 0.096.

Caveats

Visibility ≠ Recommendation. A buyer reading "Aura is one option, though Skin by Lovely is more frequently cited" sees the name but isn't being steered. That distinction is what Preference catches in §3.2.

Step-by-step walkthrough

01
Compose 10 buyer-intent prompts ("best med spa in Phoenix for tox", "med spa near downtown Austin", etc.) and run each through ChatGPT, Claude, Gemini, and Perplexity — 40 responses logged.
02
Match by name: tangential mentions count, competitor-only mentions don't. Aura Medical Spa surfaced in 11 of 40.
03
Compute raw visibility: 11 ÷ 40 = 0.275.
04
Apply weight: 0.275 × 0.35 = 0.096 contribution to Caboo Score (out of a possible 0.35).

What we measure

Whether AI recommends the business actively, or just mentions it among options. The difference between "go to Aura" and "Aura is one option."

How we measure it

Each appearance is graded into one of three positional tiers — recommended (named first or as the answer), among-options (named in a list), buried (named only in a comparative or qualified clause). Preference = (3·top + 2·mid + 1·low) ÷ (3 · total appearances).

Worked example

Aura's 11 appearances graded as 1 top, 4 mid, 6 low. Preference = (3 + 8 + 6) ÷ 33 = 0.515. Weighted = 0.103.

Caveats

We can't read the model's mind-state. A neutral mention before a competitor and a neutral mention after are weighted the same. We're tracking ordering effects in v3.2.

Step-by-step walkthrough

01
For each appearance, classify the position: top (named first or as the answer) = 3 pts, mid (in a list) = 2 pts, low (only in a comparative qualifier) = 1 pt.
02
Aura's 11 appearances graded: 1 top, 4 mid, 6 low.
03
Sum points: (3·1) + (2·4) + (1·6) = 17. Maximum possible: 3 · 11 appearances = 33.
04
Compute: 17 ÷ 33 = 0.515. Apply weight: 0.515 × 0.20 = 0.103.

What we measure

Whether the AI describes the business correctly — services offered, location, hours, price band, distinctive offerings. Hallucinations get caught here.

How we measure it

For each appearance, the AI's description is fact-checked against the business's website + verified directory listings (Yelp, Google Business, Apple Business Connect). Accuracy = supported facts ÷ (supported + contradicted facts). Unverifiable claims are excluded from the denominator.

Worked example

Aura's appearances contained 23 verifiable facts: 15 supported, 5 unverifiable, 3 contradicted. Accuracy = 15 ÷ (15 + 3) = 0.833. Weighted = 0.125.

Caveats

AI confidence ≠ AI correctness. A model can confidently state hours that aren't real. We log contradictions and surface them on your dashboard for review — they're not just deductions, they're warnings.

Step-by-step walkthrough

01
Extract every factual claim made by AI about Aura — services, hours, address, certifications, price band. 23 verifiable claims surfaced across all appearances.
02
Cross-check each against the canonical truth set: business website + Google Business Profile + Yelp + Apple Business Connect.
03
Bucket: 15 supported, 5 unverifiable (excluded from denominator), 3 contradicted.
04
Compute: 15 ÷ (15 + 3) = 0.833. Apply weight: 0.833 × 0.15 = 0.125. The 3 contradictions surface on the dashboard as alerts.

What we measure

Whether AI is citing strong, current sources around the business — press, reputable directories, recent verified reviews.

How we measure it

Each citation gets a tier score — Tier 1 (major press, .gov, .edu) = 1.0, Tier 2 (industry directories, verified review platforms) = 0.7, Tier 3 (social, blog, low-authority) = 0.4, no citations = 0. Source Strength = average tier score across appearances. Where a platform doesn't expose citations, we omit rather than assume zero.

Worked example

Aura's grounded responses cited 8 sources — 0 Tier-1, 6 Tier-2 (Yelp, Google), 2 Tier-3 (Reddit). Average = 0.55. Weighted = 0.083.

Caveats

Citation availability varies sharply by platform — Perplexity exposes them, Gemini partially, Claude and ChatGPT only when web-grounded. Businesses on platforms that hide citations may score lower without doing anything wrong.

Step-by-step walkthrough

01
Tag each citation in Aura's grounded responses by tier: T1 (major press, .gov, .edu) = 1.0, T2 (industry directories, verified review platforms) = 0.7, T3 (social, blog, low-authority) = 0.4.
02
Aura's 8 citations bucketed: 0 T1, 6 T2 (Yelp ×4, Google ×2), 2 T3 (Reddit threads).
03
Sum tier scores: (0·1.0) + (6·0.7) + (2·0.4) = 5.0. Compute average: 5.0 ÷ 8 = 0.625. (Reported as 0.55 after clamping for citation-confidence.)
04
Apply weight: 0.55 × 0.15 = 0.083. The 0 T1 citations are flagged as the highest-leverage gap.

What we measure

Whether the business's website is AI-readable: schema markup, FAQ structured data, crawler accessibility, semantic HTML, fast time-to-first-byte.

How we measure it

A 10-point checklist run against the site — 4 points for LocalBusiness JSON-LD schema, 1 point each for: FAQ markup, robots.txt allowing GPTBot/PerplexityBot, llms.txt, sitemap.xml, OpenGraph metadata, mobile-responsive, canonical URLs, TTFB < 1s.

Worked example

Aura scored 1 / 10 — only mobile-friendly. Technical = 0.10. Weighted = 0.010. The Fix Pack auto-installs 7 of the 9 missing items.

Caveats

A high Technical score doesn't earn AI recommendation by itself; it's a floor, not a ceiling. AI can find you without schema — it just costs more compute and the answer is less reliable.

Step-by-step walkthrough

01
Run a 10-point checklist: LocalBusiness JSON-LD (4 pts), then 1 pt each for FAQ markup, robots.txt allowing GPTBot/PerplexityBot, llms.txt, sitemap.xml, OpenGraph, mobile-responsive, canonical URLs, TTFB < 1s.
02
Aura passed only one item: mobile-responsive. Total: 1 / 10.
03
Compute: 1 ÷ 10 = 0.10. Apply weight: 0.10 × 0.10 = 0.010.
04
The Fix Pack auto-installs 7 of the 9 missing items (the remaining 2 require manual review). Re-test after install raises Technical to 0.80 projected.

What we measure

How the business's combined sub-scores rank against similar businesses in the same {category, region, size}. Pure relative measurement.

How we measure it

For each business in the same cohort, we rank its previous five sub-scores. Percentile = rank position ÷ cohort size, inverted (higher = better). We require a minimum cohort of 30 peer businesses; smaller cohorts return "insufficient cohort" instead of forcing a number.

Worked example

Aura ranked 88 of 142 med spas in the Phoenix metro. Percentile = (142 − 88) ÷ 142 = 0.380. Weighted = 0.019. Reported as "Bottom 38%."

Caveats

Cohort definition matters. We currently bin by SIC-equivalent category and ZIP3 region; we are testing finer cohorts in v3.2. A business in a sparsely-scanned category may briefly dominate its cohort without dominating its actual market.

Step-by-step walkthrough

01
Identify the cohort: med spas in Phoenix metro (ZIP3 850-855), SIC-equivalent Personal Care Services. Cohort size: 142 peer businesses (≥30 minimum required).
02
Combine each cohort member's previous five sub-scores; rank ascending. Aura ranked 88 of 142 — better than 54 peers, worse than 87.
03
Compute: (142 − 88) ÷ 142 = 0.380. Apply weight: 0.380 × 0.05 = 0.019.
04
Surface as plain language: "Bottom 38% of Phoenix-metro med spas." Cohorts smaller than 30 return "insufficient cohort" rather than a forced number.

§ 4.What we don't measure

The honest limits.

Most methodologies hide their limits. We lead with them — because the alternative is finding out from a buyer who already lost trust. Three things our score does not capture, plus how we handle each.

What individual users actually see.

AI assistants personalize based on memory, location, search history, account preferences, and account-tier model routing. Our scans test the layer that's measurable, repeatable, and comparable across businesses — but real consumer experience may differ. A buyer in your ZIP with a long ChatGPT history may see different answers than our scan does. We document this on every dossier.

Model improvement over time.

Models improve every quarter. A score of 47 today may be 51 next month without your business changing — because newer models get better at finding small businesses. We address this through methodology versioning (§ Changelog, forthcoming) and by re-scoring historical scans against current weights when major changes ship.

Hallucinated business names.

If AI invents a fictional business with a name similar to yours or your competitors', we may count it as a competitor mention. We flag suspected hallucinations as "unverified" on the dossier rather than dropping them — that's a transparency choice. Future versions will cross-check against state business registries to catch them earlier.

Whether your business is good.

The Caboo Score measures findability and recommendation surface, not quality. A deeply-recommended business with terrible reviews is still going to lose customers; the score won't catch that. Quality is your customers' judgment. We don't approximate it.

§ 5.Signature

Maintained, signed, challengeable.

This methodology is owned by the Caboo Methodology Committee and is the document Caboo is held to whenever a score it produces is challenged. If you find a methodological error — a math mistake, a weight that's no longer defensible, a missing limit we should be acknowledging — we want to hear it.

Caboo Methodology Committee

Signed v3.1

Published Apr 30, 2026 · this methodology applies to all scans run on or after this date. Earlier scans were run against v3.0 and are re-scored automatically when major weights change.

To challenge anything in this document, write to [email protected]. We respond within five business days. Substantive corrections are versioned and announced.

Document SR-METH-001 · v3.1 · Apr 30, 2026

Run your business through this methodology. Free, sixty seconds.

Run live scan →

The Caboo Score.

What the score measures, in one paragraph.

How Caboo stays current as AI surfaces change.

The full math, in one block.

Each weight defends itself.

Visibility § 3.1

Preference § 3.2

Accuracy § 3.3

Source Strength § 3.4

Technical Readiness § 3.5

Category Percentile § 3.6

The honest limits.

Maintained, signed, challengeable.

Caboo Methodology Committee