Every CMO eventually asks: "what is our AI visibility?"
They want a number. Ideally one number. Plotted over time. The kind of metric you can put in the quarterly deck and say "we went from 28 to 42 percent." The kind of number that Marketing Ops can defend in a board meeting.
That number does not exist by default. You have to build it.
This post walks through building it in around 80 lines of code. The math is simple. The data layer (calling six AI engines and parsing whether your brand is in the answer) is the work that takes weeks. We will skip that and call an API that already does it.
What "AI visibility" actually means
An AI visibility score is the fraction of category-relevant AI answers that mention your brand. Three components.
The query set. The questions your customers actually ask AI engines about your category. Not the keywords SEO tools surface. The conversational questions humans type into ChatGPT.
The surfaces. Where the answers come from. Today that means six: ChatGPT, Claude, Gemini, Perplexity, Google AI Overviews, and Bing Copilot. Some tools include Google AI Mode as a seventh.
The roll-up. For each (query, surface) cell, did your brand get mentioned? Yes is 1, no is 0. Sum, divide by total cells, multiply by 100. That is the score.
Why most teams get this wrong
Two patterns dominate.
Manual sampling. Someone runs 5 queries through ChatGPT once a week, pastes the results into a spreadsheet, and calls it AI visibility tracking. The numbers are unreliable because the sample is too small and the surface coverage is too narrow.
Vendor scores. Tools like HubSpot AEO and Profound publish their own visibility score. The number is fine. The problem is you cannot inspect what went into it. Did they include AI Overviews? What query weights? What sample size? You are trusting their methodology blind.
A visibility score you can compute yourself, from queries you defined, on a cadence you control, beats any black-box vendor score.The case for building it yourself
The build
Aim for 20 to 50 queries. Below 20 the score is noisy. Above 100 you spend more and learn less. The sweet spot is 30.
Three buckets to fill: category-defining ("best CRM for small business"), comparison ("Salesforce vs Pipedrive"), and use-case ("CRM for a 5-person sales team"). Roughly equal numbers in each.
{
"brand": "Pipedrive",
"queries": [
"best CRM for small business",
"best CRM for sales teams",
"CRM for a 5-person startup",
"Pipedrive vs HubSpot",
"Pipedrive vs Salesforce",
"alternatives to Salesforce",
"what is the cheapest CRM with email automation",
"best simple CRM 2026",
"lightweight CRM for consultants",
"CRM with kanban pipeline view"
]
}Call /v1/check with mode: all_live and your brand name. The response includes a providers map with every surface, and each surface returns a mentioned boolean for the tracked brand.
import config from "./queries.json" assert { type: "json" };
const SURFACES = ["chatgpt", "claude", "gemini", "perplexity", "ai_overview", "bing_copilot"];
async function checkQuery(query) {
const res = await fetch("https://api.mentionsapi.com/v1/check", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.MENTIONSAPI_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
mode: "all_live",
query,
track_brands: [config.brand],
}),
});
const data = await res.json();
return data.providers;
}
const presence = {};
for (const q of config.queries) {
presence[q] = await checkQuery(q);
}
console.log(JSON.stringify(presence, null, 2));For each surface, count how many queries mentioned your brand divided by total queries. That is the presence rate per surface. Average across surfaces for the overall visibility score.
function computeScore(presence, brand) {
const SURFACES = ["chatgpt", "claude", "gemini", "perplexity", "ai_overview", "bing_copilot"];
const perSurface = {};
let totalCells = 0;
let totalMentions = 0;
for (const surface of SURFACES) {
let mentions = 0, total = 0;
for (const query of Object.keys(presence)) {
const cell = presence[query][surface];
if (!cell) continue;
total++;
const brandHit = cell.brands?.find(b => b.name === brand);
if (brandHit?.mentioned) mentions++;
}
perSurface[surface] = total > 0 ? mentions / total : 0;
totalCells += total;
totalMentions += mentions;
}
return {
overall: totalCells > 0 ? (totalMentions / totalCells) : 0,
perSurface,
};
}
console.log(computeScore(presence, "Pipedrive"));Sample output:
{
"overall": 0.42,
"perSurface": {
"chatgpt": 0.50,
"claude": 0.30,
"gemini": 0.40,
"perplexity": 0.60,
"ai_overview": 0.40,
"bing_copilot": 0.30
}
}Pipedrive has a 42% AI visibility score. ChatGPT and Perplexity are strongest. Claude and Bing Copilot are the underinvested surfaces.
Save the score with the date. After 30 days you have a trend line worth showing an executive. Most teams use Grafana, Plausible, or just SQLite plus a tiny Recharts page.
import fs from "node:fs/promises";
const today = new Date().toISOString().slice(0, 10);
const score = computeScore(presence, "Pipedrive");
const row = { date: today, ...score };
const log = JSON.parse(await fs.readFile("scores.json").catch(() => "[]"));
log.push(row);
await fs.writeFile("scores.json", JSON.stringify(log, null, 2));Plot overall as the headline metric. Plot per-surface as the breakdown. The first 30 days of trend data is what makes the deck slide work.
What this costs
30 queries × $0.50 × 30 days = $450 worst case. With 60% cache hits over a daily run, the real bill is closer to $30 to $50 a month. Add 20 more queries for better statistical significance and you are still under $80.
Where most builds get stuck
Two specific traps.
Picking the wrong queries. Most teams start with the broad ones ("best CRM"). Those are the queries every brand competes on. Your visibility is naturally low. Add 10 long-tail conversational queries that match how your specific customers ask. Those move the score where the broad queries cannot.
Over-engineering the rollup. Some teams add weighting before they have 30 days of data. The weights are arbitrary, the math feels rigorous, and the score is misleading. Resist this. Start with equal weighting. Layer weighting in month three when you have evidence.
Frequently asked questions
What is an AI visibility score?
How is this different from share of voice?
How many queries do I need to make the score meaningful?
Do I need to weight surfaces differently?
What does the score look like for a typical SaaS brand?
How often should I publish the score internally?
Ship it this week
Day one: build the query set and the snapshot pipeline. Day two: roll up the score. Day three: schedule daily and start collecting trend data. By day thirty, you have a real visibility number with a trend line. That is the deck slide your CMO has been asking for.
The build is small. The deliverable is large.