Sooner or later somebody hands you a BI evaluation matrix. The columns are Tableau, Power BI, Looker, Sisense, ThoughtSpot, Reveal. The rows are dashboards, embedding, semantic modeling, warehouse compatibility, narrative reporting. The exercise looks reasonable, the cells fill in, and a winner emerges. Six months later the product underperforms in ways nobody can quite articulate. Reports look like screenshots. The ops team opens the dashboard only to click through to other systems. Leadership has stopped opening the executive dashboard.
The matrix is a real exercise; every product team has to run it eventually. The trap is running it first, before clarifying what kinds of reporting need different surfaces.
In most product organizations I’ve watched, skipping that step produces something recognizable: sprawl. Multiple tool licenses, no cohesion across reports and dashboards, users still running workarounds because no single tool quite fit the job it was handed. AI features amplify it. More vendors offer more in-tool AI, there is more incentive to bolt features in across the stack, and more risk of overpaying for AI without knowing where each capability belongs.
An analytics product that serves more than one audience, and that is most of them, is serving three reporting surfaces at once. They differ on five dimensions: audience, cadence, governance, permission model, output format. None of the five compromises gracefully. Force a single tool to span all three surfaces and the result is mediocre on every axis — the kind of failure that never trips an alarm. It just produces a system nobody relies on.
The first decision is not which BI tool. It is recognizing that the surfaces are different products, and giving each the architecture it deserves. The second decision is what sits underneath all three, and it matters more than any of the surface choices. Vendor selection comes last. Once the rest is done, it barely matters.
The three surfaces
Surface A is the program report. A school report, a quarterly client deck, a regulatory filing, a board appendix. The audience is specific (one school, one client, one regulator), and the cadence is per cycle. Every output goes through review: a program lead reads it, comms revises the language, legal sometimes signs off, and it gets published. The artifact persists. Three years from now someone will pull the Spring 2024 version and expect it to render identically. Sentence-level language matters here, because the readers are non-technical and personally invested in what the data says about them.
Surface B is the operator console. Internal staff, opened whenever they open it, governance essentially nil because nothing gets published from it. The user wants “what’s mine, what’s overdue, what changed since last week.” The output is the workflow itself. The user does work in the surface; they don’t read it and walk away. Its failure mode is being a day out of date.
Surface C is the executive view. Leadership and board, quarterly cadence, governance light because the audience already trusts the source. The user wants “are we winning at the portfolio level,” not site-level granularity. The artifact is usually projected in a meeting, occasionally exported to a PDF nobody reads carefully.
These are three different products. Who reads them, how often, under what review, with what access, how long the output must last — they differ on every one of those, and not one of them bends gracefully.
| Surface A — report | Surface B — console | Surface C — executive | |
|---|---|---|---|
| Audience | external, specific | internal, role-based | leadership, portfolio |
| Cadence | per cycle | continuous | quarterly |
| Governance | reviewed, approved | none | light |
| Permission | per-recipient | role-based | broad |
| Output | branded document | live workspace | summary view |
No single tool is good at all three surfaces. What most teams end up with is a tool that is strong on one and just acceptable on the others, accepted as “the BI stack” by default. That is not a tool problem. It is what the org chart produces: one person who owns “BI,” one tool that owns “all reporting.” It will reproduce with whichever vendor you pick next, until the structure changes.
Why no BI vendor solves the publication problem
The interesting case is Surface A, because that is where the BI vendor pitch breaks down hardest and where most organizations underweight the gap.
Most BI tools on a typical evaluation matrix assume the deliverable is a screen rendered live against a data source. The publication problem needs the inverse: a templated document with branded typography, headers, footers, footnotes, and language that comms or design owns separately from the analyst. PDF export from a BI tool produces a screenshot, not a structured document. The artifact a board member emails to a colleague is a Word file, or a PDF that looks like one. Nobody forwards a dashboard URL with a login prompt behind it.
There is also a layout-ownership problem. In a BI tool, the analyst owns the visual layout because the layout is the dashboard. In a real publication system, comms or design owns the template and the analyst owns the data. Those should be different people with different review authority and different release cadences. BI vendors don’t have a credible answer for that split, and most don’t try.
The right architecture for Surface A is the one formal-publication systems already use: legal contracts, financial statements, regulatory filings, clinical trial reports. Templated layout owned by the layout team, data tokens inserted at render time, the result snapshotted at publication. Concretely: a Word or PowerPoint template registry, a token resolver that pulls metric values from the warehouse at render time, a snapshotting layer that records what data each published artifact came from, and a review-state machine — draft, commented, approved, published.
This is unglamorous plumbing. It has no vendor logo. But it gives you four things no BI tool can match. The output is reproducible: the same metric snapshot and the same template produce a byte-identical file forever. The layout is independent: comms can rewrite the template without involving engineering. The governance reaches sentence level: every published string has a known author, reviewer, and timestamp. And it stays LLM-safe: a model can fill narrative tokens from a single metric row, with the reviewer’s sign-off recorded before anything publishes.
The mistake organizations make is being talked into replacing this path with “embed a Tableau dashboard in a PDF” because it sounds like one less system. The system that goes missing is the one that was shaped for the job.
The keystone: one canonical computation per concept
The keystone argument is the one most worth holding firm on. It is the difference between a system that scales and a system that erodes trust over time.
Whatever combination of surfaces gets built, all of them should read from a single semantic layer where every metric is defined exactly once, in code, tested, and version-controlled. dbt is the dominant choice for this on the warehouse side, but the principle is older than the tool. It is just one canonical computation per concept.
In practice, each metric has a single materialization. Domain rollups, scoring rules, trajectory classifications: each defined in one file, with one reviewable diff. Tests run automatically — grain uniqueness, score-range bounds, mapping coverage, completeness thresholds. The build graph tracks which downstream artifacts depend on each upstream change, so when an item map advances from V3 to V4, you know which dashboards and reports need re-validation.
All three surfaces then read from the same materialized marts. The publication template resolves a row from marts.f_cycle_metrics. The operator console queries marts.f_open_actions. The executive dashboard reads marts.f_portfolio_rollup. There is one source of truth for any metric, and it costs about one engineer-month of upfront modeling plus the standing discipline of not adding shadow definitions.
That discipline is the part that fails. Which is why the architecture has to do the enforcing.
The drift war story
This matters, and “do the keystone first” is non-negotiable, because of the failure mode it prevents.
Without a single semantic layer, this is what happens. The publication template uses select avg(item_value)… written by an analyst in 2023. The dashboard uses Tableau’s calculated-field syntax with a slightly different filter for null handling. The Streamlit prototype uses pandas with .mean() and forgets to drop incomplete responses. The operator console uses a Python helper that hardcodes which items belong to which domain. Each calculates “Domain 4 score” with technically reasonable but slightly different rules.
Three artifacts, same site, same cycle, three different numbers. A program lead notices. They escalate. The analyst spends a week reconciling. Next quarter it happens again, because nothing structurally changed: three definitions still live in three places, and any one of them can drift on its own.
This is one of the most common failure modes in analytics products. It’s slow. It rarely triggers a single alarm. It just gradually corrodes the credibility of every artifact the team produces, until people stop quoting numbers in meetings and start saying “I’d want to verify that.” Once that phrase shows up, the product is functionally dead even if it’s still being maintained.
This is what a decision-system problem looks like traced to its root: not a missing dashboard, but a missing source of truth, taxing every decision downstream with the cost of re-verification. Every number now arrives with a question attached, and that question is load the decision-maker carries before they can act.
The semantic layer prevents drift by removing the alternative. If the only way to compute “Domain 4 score” is to read marts.f_school_domain_wave, drift cannot happen between surfaces, because there is no second definition to drift toward. That is the keystone property. It is not that everyone agrees to be careful; it is that there is nothing to be careless about.
Sequencing
This is where most teams get it wrong, including teams that nominally agree with everything above. The instinct is to start with the visible surface: pick a BI tool, build a dashboard, show leadership progress. The result is a polished surface over an ungrounded data layer, which is worse than the prototype it replaced, because now it looks credible.
The right sequence is:
- 01Build the semantic layer first. dbt project, marts, tests, version pinning. No surfaces yet. This feels like nothing is shipping. It is the most important phase.
- 02Then Surface A, because publications are the highest-stakes artifact and the architecture is mostly plumbing on top of templates that already exist. The token resolver and the snapshotting are well-understood patterns; the work is in template inventory and review-state machinery, not in invention.
- 03Then Surface B, the operator console. Streamlit-in-warehouse or a thin app on top of the marts. This is where actions get assigned and tracked, which is where program value compounds.
- 04Then Surface C if needed at all, and the vendor question is small at this point. Tableau, Power BI, Looker: they all read from a properly-modeled warehouse competently. Pick whatever the org already has skills in. This is the last decision and the least consequential one.
The vendor-evaluation matrix that arrives at the start of this work is not wrong. It is premature. Run it after the semantic layer exists and you will find the choice barely matters. Run it before, and the six months you spend on it will go toward tuning the wrong variable, while the layer that decides the outcome still hasn’t been built.
Closing
The question “which BI tool should we use” feels like a real decision: there is a matrix, there are vendors, there are demos. But it is the last decision in the sequence, and the sequence is what determines whether the product works.
Three surfaces. One keystone. Vendor last.
If a BI evaluation matrix lands on your desk, the right first response is to ask which surface it’s for, then to ask what sits underneath. If the answer to either is unclear, the matrix isn’t ready to be filled in yet.
Underneath all of it is one discipline: a reporting system is only as good as the decision it was built to serve. Most reporting gets built backward, from the tool the organization happened to buy, instead of forward from the decision someone has to make. Name the decisions. Give each its own surface. Put one canonical computation beneath them all. The vendor is the last and smallest choice. From fragmented to decision-ready is the distance that sequence closes.
Written May 2026 for the Analytic Bytes Library. Tool capabilities and product names cited reflect that period; the architectural argument is intended to outlast specific vendor features.
Questions, pushback, or a problem that looks like this one? Write to chai@analyticbytes.systems.