Most AI visibility tools query base models without web search enabled. For companies outside the US — especially Swiss and DACH B2B — this produces numbers that are actively misleading.
Open any AI visibility platform and you will see dashboards full of citation counts, mention scores, and share-of-voice percentages. It looks rigorous. It feels like data. For many companies — particularly those outside the United States — it is measuring something that barely reflects the world their customers live in.
The core problem is simple: most AI visibility tools query large language models in their base form, without web search enabled. They count how often your brand appears in responses generated from frozen training data, label this "AI visibility," and charge you a monthly fee to track it. This approach has three critical flaws that compound each other — and they hit non-US companies hardest.
Every major LLM has a training cutoff. The data a model learned from was collected, cleaned, and baked in months — sometimes over a year — before you ever typed a query. When an AI visibility tool sends prompts to a base model without search, it is asking "what did this model know in its training data?" not "what would a real buyer find today?"
If you published an excellent case study last month, redesigned your website last quarter, or earned a major press mention last week, none of that exists in a base model query. A tool that measures base model responses will report you as invisible when you may, in practice, be prominently featured in every AI-generated answer a live buyer receives today.
This matters because content freshness is one of the highest-leverage activities in generative engine optimisation. You cannot measure the impact of recent work if your measurement tool ignores recent work.
The training corpora for the major LLMs skew dramatically toward English-language, US-origin content. This is not a bias anyone deliberately introduced — it is simply a reflection of where the internet's content volume has historically concentrated. The practical consequence for a Swiss or German B2B company is stark: your firm may have near-zero presence in base model training data, not because you are unknown in your market, but because the model never saw you.
A base model query for "best ERP providers for Swiss manufacturing" will surface companies with large English-language footprints, regardless of their actual market presence in Switzerland. A search-enabled query for the same question will pull from Swiss business directories, German-language trade publications, and your own website — a completely different information landscape.
When an AI visibility tool tells a Basel-based software company that their citation rate is 3%, the honest interpretation is: "3% in frozen, US-weighted training data." The number that actually matters — how often you appear when a Swiss procurement manager asks an AI assistant the same question with real-time search — is not being measured at all.
This is why understanding how AI search actually works — specifically when models retrieve live data versus recall training data — is foundational before interpreting any AI visibility metric.
Consider how your actual buyers use AI tools today. ChatGPT's default mode includes web search. Perplexity has always been built around real-time retrieval. Google's AI Mode draws from its live index. Microsoft Copilot integrates Bing search throughout. The scenario where a buyer asks an AI question and receives an answer based purely on training data — with no search, no retrieval, no fresh content — is increasingly rare.
By querying base models, AI visibility tools are measuring a user experience that is fading from relevance. They are telling you how visible you are in a mode that fewer and fewer real buyers actually encounter.
For a Swiss B2B company, this distinction is not academic. Your buyers are asking ChatGPT "welche CRM-Anbieter eignen sich für Schweizer KMU?" or "what are the best Swiss-made industrial sensors?" and receiving answers drawn from live search results. Your website, your recent articles, your listings in Swiss business directories — these are the signals that matter. A tool that ignores them is not measuring your AI visibility. It is measuring your AI history.
There is a structural reason so many tools default to base model queries: cost. A search-enabled query — one that instructs the AI to browse the web before answering — costs roughly 100 to 150 times more than a base model query. At that price ratio, a tool charging CHF 79 per month simply cannot run search-enabled queries at the volume required to produce statistically meaningful data across hundreds of keywords and multiple AI models.
So the industry has quietly converged on base model queries. The dashboards look the same. The reports look credible. The methodology footnote is buried, if it appears at all. The result is a generation of AI visibility products that are optimised for affordability and optics, not for accuracy.
This is not a criticism of any specific company — it is a structural incentive problem. When the economically viable measurement approach diverges from the methodologically correct one, market pressure tends to reward the cheaper approach. Understanding this dynamic helps you ask better questions when evaluating any AI visibility tool.
For companies operating in Switzerland, Germany, and Austria, base model measurement is particularly unreliable for a compounding set of reasons:
The practical implication: a Swiss company that has invested in strong German-language content, local listings, and a well-structured website may be dramatically underestimating its actual AI visibility because the tool measuring it never uses search. Conversely, a company that is genuinely weak in real-time AI responses may be reassured by inflated base model scores. Neither outcome serves your strategy.
Before accepting a citation count or visibility score at face value, ask the tool provider four questions:
These questions are not gotchas. They are the foundation of methodological hygiene in a young and still-maturing measurement category. The tools that answer them clearly deserve more trust.
The correct approach — harder and more expensive, but the only one that reflects reality — uses search-enabled queries, tests in the buyer's language, simulates the buyer's locality, and shows the actual response text alongside any aggregate scoring.
This means running queries through ChatGPT with web search active. It means sending German queries for German-speaking markets and Swiss-targeted queries for Swiss buyers. It means recording what the AI actually said, not just whether your brand appeared. And it means doing this consistently over time, because AI visibility changes week to week as models update, competitors publish, and search indices refresh.
For DACH companies in particular, this methodology frequently surfaces a different picture than base model measurement. Companies that had written off AI as a channel where they were structurally disadvantaged often discover they have meaningful presence in real-time search-enabled responses. Companies that felt comfortable based on strong base model scores sometimes discover their real-time visibility is much thinner. Either way, the measurement is actionable.
If you are trying to understand whether your content investments are reaching buyers who use AI tools, looking at real search-enabled AI responses is the only way to know for certain. Base model scores tell you about the past. Search-enabled scores tell you about now.
To illustrate how dramatically results can differ, here is a real-world comparison using a common Swiss B2B query:
The AI lists primarily large, internationally known firms: Deloitte, PwC, Accenture, with perhaps one or two Swiss-specific mentions like InfoGuard or Compass Security. The response reflects global training data where large firms with extensive English-language web presence dominate. Smaller Swiss specialists — even market leaders in their niche — are absent because they did not have sufficient English-language training data presence at the time of model training.
The AI searches the web and returns a different picture: a mix of established Swiss firms (InfoGuard, Redguard, terreActive), specialised boutiques, and the large international firms. The response reflects the actual Swiss market landscape because it draws from Swiss business directories, German-language industry publications, and the companies' own websites — sources that real-time search can access but that base model training data may underrepresent.
For a mid-sized Swiss IT security consultancy, the base model measurement would show zero visibility. The search-enabled measurement might show 40-60% visibility. The difference is not a rounding error — it is the difference between concluding "we are invisible and AI does not work for us" and "we have meaningful AI presence that we can build on."
If you are currently paying for an AI visibility tool and want to verify its methodology, here is a practical test you can run in 30 minutes:
For Swiss B2B companies, accurate AI visibility measurement must address all five of these dimensions:
| Dimension | What It Means | Why It Matters for Swiss B2B |
|---|---|---|
| Search mode | Base model vs search-enabled | Search-enabled reflects what buyers actually see; base model reflects frozen, US-skewed training data |
| Language | English, German, French, Italian | Swiss buyers query in multiple languages; AI responses differ by language |
| Locality | Swiss, DACH, global | Swiss-targeted queries produce different results than global ones |
| Platform | ChatGPT, Claude, Perplexity, Google AI | Each platform draws from different indices and produces different recommendations |
| Time | Point-in-time vs ongoing tracking | AI visibility changes weekly as models update and competitors publish |
A tool that addresses all five dimensions gives you a measurement that accurately reflects your buyers' experience. A tool that misses even one dimension introduces systematic bias that can lead you to invest in the wrong activities.
The question "how visible am I in AI?" is only meaningful if it specifies: visible to whom, in which language, in which location, on which platform, at what point in time. An AI visibility tool that does not answer all five dimensions is giving you a partial answer — and for companies outside the US, it is usually the least relevant part of the answer.
Measure what your buyers actually experience. That means search-enabled queries, in their language, simulating their location, showing the actual response. Everything else is a proxy for something that does not quite exist anymore.
Cost. A search-enabled AI query costs approximately 100-150 times more than a base model query. For a tool that needs to run thousands of queries per month across multiple models to provide meaningful data, the cost difference is enormous. Most AI visibility tool providers have optimised for economic viability and visual impressiveness of dashboards rather than methodological accuracy. This is not necessarily dishonest — many tools were built before the importance of search-enabled measurement was fully understood. But as the field matures, the gap between base model measurement and search-enabled measurement is becoming increasingly well-known, and tools that do not address it are providing data of diminishing value.
Yes. per4mx queries AI models with search capabilities enabled, in the language and locality of your target market. This means the results you see in per4mx reflect what a real Swiss buyer would see when they ask the same question. For German-language queries, per4mx sends German prompts and captures the German-language AI response. For Swiss-targeted queries, it simulates a Swiss user context. This methodology is more expensive to operate but produces data that accurately represents your buyers' experience — which is the only data worth acting on.
Yes, and this is a pragmatic approach if you are locked into a tool contract. Use the tool for trend tracking (is my score going up or down over time?) while supplementing with manual search-enabled testing monthly. Run ten to fifteen prompts across ChatGPT, Claude, and Perplexity manually, in both English and German, and record the actual responses. Compare your manual findings with what the tool reports. The manual testing gives you ground truth; the tool gives you trend data. Together, they provide a more complete picture than either alone.
Yes. Open ChatGPT (with a Plus subscription for guaranteed search access), Claude, Perplexity, and Google AI. Type your buyer-relevant prompts and record the responses. This is the most accurate measurement possible — you are seeing exactly what your buyers see. The limitation of manual testing is scale and consistency: it is hard to test dozens of prompts weekly across four platforms and track trends over time. This is where a properly designed tool like per4mx adds value — not by being more accurate than your own eyes, but by automating the process at a scale that manual testing cannot sustain.
The most effective demonstration is a live comparison. In a meeting, pull up your AI visibility tool's dashboard showing your score. Then open ChatGPT and Perplexity and run one of the same prompts the tool measured. If the live AI response differs significantly from what the tool reports — for example, you are visible in the live response but the tool says you are invisible — the point makes itself. For Swiss companies, running the same prompt in German often produces the most dramatic difference between base model scores and real-world visibility. This ten-minute demonstration is usually sufficient to justify investing in accurate measurement methodology.
Ready to take action?
See how ChatGPT, Claude, Perplexity, and Gemini describe your company today. Get a free visibility report in minutes.