How to build an AI visibility tool

TL;DR

Pick 30 queries. Run them through six AI surfaces daily. Roll up presence into one visibility score. Plot it. Cost: about $30 a month. The deliverable is a single number your CMO will care about more than any dashboard.

Every CMO eventually asks: "what is our AI visibility?"

They want a number. Ideally one number. Plotted over time. The kind of metric you can put in the quarterly deck and say "we went from 28 to 42 percent." The kind of number that Marketing Ops can defend in a board meeting.

That number does not exist by default. You have to build it.

This post walks through building it in around 80 lines of code. The math is simple. The data layer (calling six AI engines and parsing whether your brand is in the answer) is the work that takes weeks. We will skip that and call an API that already does it.

What "AI visibility" actually means

An AI visibility score is the fraction of category-relevant AI answers that mention your brand. Three components.

The query set. The questions your customers actually ask AI engines about your category. Not the keywords SEO tools surface. The conversational questions humans type into ChatGPT.

The surfaces. Where the answers come from. Today that means six: ChatGPT, Claude, Gemini, Perplexity, Google AI Overviews, and Bing Copilot. Some tools include Google AI Mode as a seventh.

The roll-up. For each (query, surface) cell, did your brand get mentioned? Yes is 1, no is 0. Sum, divide by total cells, multiply by 100. That is the score.

The score is intentionally simple. Resist the urge to weight it before you have evidence the weighting is right. Weighted scores that are wrong feel scientific but are misleading. Equal weighting that is honest beats fake precision.

Why most teams get this wrong

Two patterns dominate.

Manual sampling. Someone runs 5 queries through ChatGPT once a week, pastes the results into a spreadsheet, and calls it AI visibility tracking. The numbers are unreliable because the sample is too small and the surface coverage is too narrow.

Vendor scores. Tools like HubSpot AEO and Profound publish their own visibility score. The number is fine. The problem is you cannot inspect what went into it. Did they include AI Overviews? What query weights? What sample size? You are trusting their methodology blind.

A visibility score you can compute yourself, from queries you defined, on a cadence you control, beats any black-box vendor score.The case for building it yourself

Get the data layer

MentionsAPI returns brand presence per surface in one call. $1 free signup credit, ~50 calls. Top up $10 to keep going.

Get API key See the AI visibility API

The build

Define your visibility query set

Aim for 20 to 50 queries. Below 20 the score is noisy. Above 100 you spend more and learn less. The sweet spot is 30.

Three buckets to fill: category-defining ("best CRM for small business"), comparison ("Salesforce vs Pipedrive"), and use-case ("CRM for a 5-person sales team"). Roughly equal numbers in each.

queries.json

{
  "brand": "Pipedrive",
  "queries": [
    "best CRM for small business",
    "best CRM for sales teams",
    "CRM for a 5-person startup",
    "Pipedrive vs HubSpot",
    "Pipedrive vs Salesforce",
    "alternatives to Salesforce",
    "what is the cheapest CRM with email automation",
    "best simple CRM 2026",
    "lightweight CRM for consultants",
    "CRM with kanban pipeline view"
  ]
}

Fetch presence per surface for each query

Call /v1/check with mode: all_live and your brand name. The response includes a providers map with every surface, and each surface returns a mentioned boolean for the tracked brand.

fetch-presence.mjs

import config from "./queries.json" assert { type: "json" };

const SURFACES = ["chatgpt", "claude", "gemini", "perplexity", "ai_overview", "bing_copilot"];

async function checkQuery(query) {
  const res = await fetch("https://api.mentionsapi.com/v1/check", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.MENTIONSAPI_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      mode: "all_live",
      query,
      track_brands: [config.brand],
    }),
  });
  const data = await res.json();
  return data.providers;
}

const presence = {};
for (const q of config.queries) {
  presence[q] = await checkQuery(q);
}
console.log(JSON.stringify(presence, null, 2));

Roll up the score

For each surface, count how many queries mentioned your brand divided by total queries. That is the presence rate per surface. Average across surfaces for the overall visibility score.

rollup.mjs

function computeScore(presence, brand) {
  const SURFACES = ["chatgpt", "claude", "gemini", "perplexity", "ai_overview", "bing_copilot"];
  const perSurface = {};
  let totalCells = 0;
  let totalMentions = 0;

  for (const surface of SURFACES) {
    let mentions = 0, total = 0;
    for (const query of Object.keys(presence)) {
      const cell = presence[query][surface];
      if (!cell) continue;
      total++;
      const brandHit = cell.brands?.find(b => b.name === brand);
      if (brandHit?.mentioned) mentions++;
    }
    perSurface[surface] = total > 0 ? mentions / total : 0;
    totalCells += total;
    totalMentions += mentions;
  }

  return {
    overall: totalCells > 0 ? (totalMentions / totalCells) : 0,
    perSurface,
  };
}

console.log(computeScore(presence, "Pipedrive"));

Sample output:

output

{
  "overall": 0.42,
  "perSurface": {
    "chatgpt": 0.50,
    "claude": 0.30,
    "gemini": 0.40,
    "perplexity": 0.60,
    "ai_overview": 0.40,
    "bing_copilot": 0.30
  }
}

Pipedrive has a 42% AI visibility score. ChatGPT and Perplexity are strongest. Claude and Bing Copilot are the underinvested surfaces.

Plot the score over time

Save the score with the date. After 30 days you have a trend line worth showing an executive. Most teams use Grafana, Plausible, or just SQLite plus a tiny Recharts page.

save-score.mjs

import fs from "node:fs/promises";

const today = new Date().toISOString().slice(0, 10);
const score = computeScore(presence, "Pipedrive");
const row = { date: today, ...score };

const log = JSON.parse(await fs.readFile("scores.json").catch(() => "[]"));
log.push(row);
await fs.writeFile("scores.json", JSON.stringify(log, null, 2));

Plot overall as the headline metric. Plot per-surface as the breakdown. The first 30 days of trend data is what makes the deck slide work.

What this costs

30Queries in a typical visibility query set

~$30Monthly cost at daily snapshots

42%Typical visibility score for established SaaS

6AI surfaces aggregated into the score

30 queries × $0.50 × 30 days = $450 worst case. With 60% cache hits over a daily run, the real bill is closer to $30 to $50 a month. Add 20 more queries for better statistical significance and you are still under $80.

Where most builds get stuck

Two specific traps.

Picking the wrong queries. Most teams start with the broad ones ("best CRM"). Those are the queries every brand competes on. Your visibility is naturally low. Add 10 long-tail conversational queries that match how your specific customers ask. Those move the score where the broad queries cannot.

Over-engineering the rollup. Some teams add weighting before they have 30 days of data. The weights are arbitrary, the math feels rigorous, and the score is misleading. Resist this. Start with equal weighting. Layer weighting in month three when you have evidence.

Trap to avoid: calling this a "score" externally. Call it the "visibility rate" or "presence rate." Calling it a score implies a 0-100 with quality intent that you have not validated. Save "score" for when you have longitudinal data and a defensible weighting.

Frequently asked questions

What is an AI visibility score?

A single percentage that says: out of all the queries that matter to your category, what fraction of AI answers mention your brand? It rolls up presence across ChatGPT, Claude, Gemini, Perplexity, and (optionally) AI Overviews and Bing Copilot into one number.

How is this different from share of voice?

Share of voice is competitive: your visibility divided by the total visibility of all tracked brands. AI visibility is absolute: your presence rate regardless of competitor presence. Most tools surface both. Build presence first, layer SoV on top later.

How many queries do I need to make the score meaningful?

At least 10. Below that the score swings too wildly between snapshots. 20 to 50 queries is the right range for a real number. Above 100 you start running into diminishing returns and the cost grows linearly.

Do I need to weight surfaces differently?

Most teams start with equal weighting because the data does not yet exist to justify a specific weight per surface. Once you have 30 days of usage, you can weight by traffic volume per surface (which MentionsAPI does not estimate, but third-party data does).

What does the score look like for a typical SaaS brand?

In our methodology runs, established SaaS brands cluster around 35 to 60 percent visibility on category-defining queries. Niche brands run 10 to 25 percent. New entrants under 10 percent. The exact number is less important than the trend over time.

How often should I publish the score internally?

Weekly to your team channel. Monthly in the all-hands deck. Quarterly to the board. The score is most useful as a slow-moving indicator, not a daily metric. Spike alerts on individual queries are a separate tool (see our rank tracking tutorial).

Ship it this week

Day one: build the query set and the snapshot pipeline. Day two: roll up the score. Day three: schedule daily and start collecting trend data. By day thirty, you have a real visibility number with a trend line. That is the deck slide your CMO has been asking for.

The build is small. The deliverable is large.

Nikhil Kumar

Founder, MentionsAPI

Growth marketer at the intersection of marketing, product, and technology. 8+ years across startups and scale-ups in India, Switzerland, and the Netherlands. Founder of Landkit (landkit.pro).

Twitter GitHub LinkedIn