One Big Table (OBT) Pattern: Fan Traps, Grain Violations, and When Denormalization Actually Works
In Q3 of last year, a B2B SaaS company's growth team presented their quarterly board report. Marketing ROAS: 4.2x across all paid channels. LinkedIn ROAS: 6.1x. Google Ads: 3.8x. The board was delighted. Budget was reallocated toward LinkedIn. A senior marketer got a raise.
Eight months later, a new data engineer joined the team and did something nobody had done before: she reconciled the attribution dashboard against Stripe. Total attributed conversion value in the dashboard: $2.4 million. Actual revenue in Stripe for the same cohort: $880,000.
The discrepancy wasn't fraud. It wasn't a bug in Stripe or a misunderstanding of the attribution window. It was a fan trap, baked into the foundation of the marketing attribution One Big Table that had been powering every board report for three quarters.
The table had been built in a single weekend sprint, praised for its elegant simplicity, and never questioned. After all, it was just one table. How complicated could it be?
This is the story of that table, why it lied, and — more importantly — the specific conditions under which a One Big Table is actually the right choice and the specific conditions under which it will, quietly and confidently, destroy your analytics.
What OBT Actually Is (It's a Choice, Not Just a Table)
OBT is a deliberate architectural commitment to grain, not a shortcut.
A One Big Table is exactly what it sounds like: one physical table — wide and denormalized — that folds a fact table and the dimension attributes you would normally join in into a single row-grain surface.
Instead of resolving campaign details at query time:
JOIN dim_campaign ON fact_session.campaign_key = dim_campaign.campaign_key
…those attributes are materialized on every fact row (here, the session). Practically, that means:
- No joins in downstream SQL or semantic models for those attributes
- One table wired into your BI tool instead of a star with N legs
- Maximum simplicity for analysts and dashboard builders — at the cost of width, duplication, and very strict grain discipline (the rest of this post)
This is not a new idea. Oracle was building materialized views that did this in 1998. The modern framing — "OBT" as a named pattern, as a deliberate architectural choice — emerged primarily through the dbt community around 2021–2022, when analytics engineers started wrestling seriously with where in the transformation pipeline denormalization belongs.
The defining question of OBT architecture isn't "should I denormalize?" — every serving layer denormalizes something. The question is: at what stage does denormalization happen, and who declares the grain?
In a star schema, grain is declared when you define the fact table: one row per session, one row per order line item, one row per ad impression. Dimension tables carry attributes with no grain ambiguity because they're joined to a fact table that has already declared its grain.
In an OBT, grain is declared… informally. In a comment somewhere. In a Slack message from six months ago. In the original author's head, which left the company in February.
That informal grain declaration is where the trouble starts.
The Two Ways OBT Silently Corrupts Your Numbers
The Fan Trap (The Double-Counting Disaster)
Back to that marketing attribution table.
Declared grain: one row per session. Every column was sourced from one of four upstream tables:
| Column(s) | Source table | Join key | Native grain |
|---|---|---|---|
session_id, user_id, conversion_id | web_sessions | — | session |
campaign_name, platform, channel | ad_campaigns | first_touch_campaign_id | campaign |
conversion_value_cents | conversions | session_id | session ✅ |
spend_cents | ad_spend_daily | campaign_id + date | campaign-day ❌ |
Three of those four sources sit at session or coarser-than-session attribute grain, and join cleanly. spend_cents does not. It comes from ad_spend_daily, which has one row per campaign per day — a different grain entirely. Pulling it onto a session row crosses a grain boundary the schema never advertised.
Worked example. Campaign A spends $5,000 on a Tuesday and gets 200 sessions attributed to it that day. After the join, the OBT looks like this:
| sessions for Campaign A on Tue | spend_cents per row | SUM(spend_cents) |
|---|---|---|
| 200 | 5,000 | 1,000,000 ❌ |
| Actual spend | — | 5,000 ✅ |
That's a 200x overstatement of spend — and the warehouse will return it in under a second, with no warning.
In the original table, the ROAS calculation was:
-- This query looks correct. It is not.
SELECT
campaign_name,
SUM(conversion_value_cents) / SUM(spend_cents) AS roas
FROM obt.marketing_attribution
WHERE session_date BETWEEN '2025-07-01' AND '2025-09-30'
GROUP BY campaign_name
Reading that query piece by piece:
SUM(conversion_value_cents)— correct. Converting sessions carry the value once; non-converting sessions areNULLand contribute zero.SUM(spend_cents)— 200x overstated, by exactly the fan-trap mechanism above.
The ratio of those two SUMs is what the dashboard called ROAS = 4.2x. The grain-safe ratio was below 1.0. So for three quarters, the company was burning money on LinkedIn while its own dashboard celebrated it as the best-performing channel. The earlier "3x overstatement" wasn't a fluke — it was the math working exactly as designed, on broken data.
A fan trap happens when you join two tables that both have a one-to-many relationship with a third table. In an OBT, this creates invisible row multiplication — and your aggregates become fiction.
The correct fix requires breaking the grain problem at the schema level, not the query level:
-- Correct: use the grain-appropriate table for spend
SELECT
c.campaign_name,
SUM(cv.conversion_value_cents) AS total_attributed_revenue,
SUM(s.spend_cents) AS total_spend,
SUM(cv.conversion_value_cents)::FLOAT / NULLIF(SUM(s.spend_cents), 0) AS roas
FROM dw.fact_conversion cv
JOIN dw.dim_campaign c ON cv.campaign_key = c.campaign_key
JOIN dw.fact_ad_performance_daily s
ON s.campaign_key = c.campaign_key
AND s.date_key = cv.date_key
WHERE cv.date_key BETWEEN 20250701 AND 20250930
GROUP BY c.campaign_name
This query works because fact_ad_performance_daily and fact_conversion are at different grains, joined at the dimension level — not through the OBT's undeclared grain.
Cross-Grain Aggregation (The Metric That Means Nothing)
The fan trap is the dramatic failure mode. Cross-grain aggregation is the quieter, more insidious one.
Picture an OBT that mixes two different grains in the same row:
| Column | Native grain | Per-row meaning |
|---|---|---|
conversion_value_cents | conversion (one row per event) | What this conversion was worth |
contact_mrr_cents | user (one value per CRM contact) | What this contact pays per month |
If a user has three conversions, the OBT will contain three rows for that user, each carrying the same contact_mrr_cents value. So:
The downstream symptom: the chart shows enterprise MRR up 40% QoQ, while ARR in the CRM stays flat. The OBT isn't fan-trapped here — it's lying through a column that simply doesn't belong at session/conversion grain.
Cross-grain columns are OBT columns that exist at a different level of aggregation than the table's primary grain. They almost always look harmless:
contact_company_size(seems like a useful attribute — it's actually a user-level property on a session-level table)campaign_total_budget_cents(campaign-level on a session-level table)product_category_total_revenue(category-level on an order-line-level table)
None of these columns can be safely SUMmed. They can only be used as filter or GROUP BY columns. That constraint is not self-documenting. It won't be enforced by your warehouse. It won't appear in your BI tool's column selector. The analyst who discovers it after summing the wrong column is not at fault — the schema is.
The Three Arguments for OBT (and Which Ones Hold Up)
Argument 1: "OBT is faster"
This one holds up — with caveats.
In 2022, Fivetran published a controlled benchmark: the same BI-style workload on Redshift, Snowflake, and BigQuery, comparing a star schema to a denormalized OBT. Across all three, OBT was roughly 25–50% faster for that style of query.
| Engine | OBT vs. star (approx.) |
|---|---|
| Redshift | 25–30% faster |
| Snowflake | ~25% faster |
| BigQuery | ~50% faster |
The BigQuery gap was the largest: that engine's columnar planner often spends more on multi-table plans than on scanning one wide relation.
Why wide tables win here
- Single-table scans are what columnar warehouses are optimized for: sequential reads through one physical object, with pruning on the columns you actually select.
- Fact–dimension joins add shuffle and coordination work — especially when dimension rows aren't co-located with the fact rows they attach to, so the join can't stay local to one node.
The caveat: Fivetran's benchmark measured the performance of correct queries. It says nothing about incorrect queries — queries that produce wrong numbers in 200 milliseconds are not a success. The fan trap in the previous section produces wrong results in perfectly fast OBT queries. Speed is irrelevant if the answer is wrong.
Argument 2: "OBT is simpler to maintain"
This one is complicated.
When simplicity is real
For a narrow, single-use-case OBT — e.g. sales_obt serving one dashboard, one team, one grain — maintenance really is lighter:
- One dbt model
- One production table to watch
- One place to change when a business definition moves
When it actually falls apart
That story breaks as soon as OBTs multiply. Picture marketing_obt, sales_obt, product_obt, and customer_360_obt side by side: you now have four denormalized copies of customer (and related entities), each with a slightly different definition of "customer".
Change the rule once — e.g. trial users don't count — and you inherit:
- Four models to edit
- Four test suites to keep honest
- Four surfaces where the definition can drift without anyone noticing
The star schema's conformed dimension solves this exactly: one dim_customer, one definition, propagated automatically to every fact table that joins it.
The verdict: OBT is simpler to maintain for a single use case, longer than expected for multiple use cases sharing the same underlying entities.
Argument 3: "OBT works better with our BI tool"
This one used to be true. In 2026, it mostly isn't.
Then (pre-semantic-layer): drag-and-drop BI tools like early Tableau and Power BI would happily let an analyst pull fields from multiple tables onto a canvas and generate semantically wrong SQL with no warning — wrong joins, wrong grain, wrong numbers.
A pre-joined OBT was defensive programming: if the tool can't be trusted to join correctly, hand it a table that's already flat.
Now (semantic layer everywhere): that defensive move is mostly obsolete. Each major BI / modeling tool has caught up — and several are actively optimized for star schemas, not flat tables:
| Tool | What it does | Star vs flat |
|---|---|---|
| Looker (LookML) | Semantic model with explicit joins | Built around stars; PDTs exist for the cases where pre-joining is genuinely needed |
| dbt Semantic Layer | Defines metrics over multiple tables | Multi-table metrics resolved at query time; no OBT required |
| Power BI (VertiPaq) | In-memory columnar engine | Performs best on star schemas, not on wide flat tables |
So the "we need OBT for the BI tool" argument now usually means one of two things:
- The team is using a tool / model that can handle stars, but nobody has set up the semantic layer yet.
- The tool genuinely needs a flat input (rare in 2026; common in legacy stacks).
The verdict: of the three OBT arguments, this is the weakest in 2026. If your BI tool truly needs a flat table to be correct, fix that at the BI layer (semantic model, certified dataset, view) — don't permanently bake the join into your warehouse and inherit a fan-trap surface forever.
When OBT Is the Right Call: Four Legitimate Scenarios
OBT is the right pattern when: one team owns the grain, one question drives the model, and the table has under ~100 columns. Outside those conditions, you're building a time bomb.
1. Single-Grain, Single-Use-Case Marts
If your OBT is truly at one grain — one row per session, no columns from other grains — and it serves one dashboard for one team, OBT is excellent. The sales_rep_performance_obt that has one row per sales activity and 40 columns of sales rep and account attributes is a well-designed OBT. It's fast, it's simple, and the grain is unambiguous.
The test: Can every column in the table be safely SUMmed? If yes, your OBT is probably safe.
2. Customer 360 / Entity-Centric Tables
When the grain is "one row per entity" (one row per user, one row per account), OBT is a natural fit. The entity IS the grain. There's no fan-out risk from dimension attributes because everything is at the entity level. These tables are also the natural output of dbt's unique_key incremental strategy — the merge operation makes sense at entity grain.
This is the "Customer 360" pattern: one row per customer with all CRM, product, billing, and engagement attributes inlined. Salesforce uses this. Hubspot uses this. The risk here isn't fan traps — it's staleness and SCD complexity, not counting errors.
3. Event-Level Analytics With Only Event-Level Attributes
An OBT for page analytics where every column is derived from the page view event itself — event_type, session_id, user_id, page_url, referrer, time_on_page, device_type, country — is safe because there is no cross-grain risk. No campaign spend. No contact MRR. No product category totals. Pure event-level columns at event grain.
Tools like Mixpanel and Amplitude internally use this model. It works because the engineers who designed it were disciplined about what can and cannot go on the table.
4. Prototype and Exploration Contexts
When you're exploring a new data domain and need to move fast — getting to a dashboard in days rather than weeks — OBT is the right initial choice with an explicit expiration date. Build it, learn from it, document its limitations, and schedule the refactor. The mistake isn't building an OBT for speed; the mistake is never scheduling the refactor and treating the prototype as production.
dbt's own blog post on marketing attribution modeling acknowledges this: the OBT often appears as the "let's just get the data out" layer before proper dimensional modeling is applied. That's fine as long as everyone knows what it is.
Dimensional Modeling vs OBT: The Actual Trade-Off Matrix
| Dimension | Star Schema | One Big Table | Verdict |
|---|---|---|---|
| Query performance (BI workloads) | Good | 25–50% faster (Fivetran, 2022) | OBT wins, modestly |
| Very wide tables (300+ cols, Snowflake) | n/a | Slower than narrow schema | Star wins |
| Fan trap risk | None (separate grains in separate tables) | High if multiple grains mixed | Star wins decisively |
| SCD Type 2 (historical accuracy) | Native (effective_from, is_current) | Awkward; requires full rebuild | Star wins decisively |
| Conformed dimensions (cross-team reuse) | Native (one dim_customer, shared) | Not possible (each OBT copies attributes) | Star wins |
| ETL/ELT build complexity | Higher (more models, more joins) | Lower (one model per use case) | OBT wins for small teams |
| Governance and testability | Strong (FK constraints, data tests at dim level) | Weaker (all constraints at OBT level) | Star wins at scale |
| BI tool setup | Requires semantic layer / LookML | Direct table connection | OBT wins for simple tools |
| Storage cost | Lower (normalized) | 3x+ higher (Fivetran: 30GB → 90GB for same data) | Star wins |
| dbt incremental model compatibility | Natural at fact grain | Complex at entity grain | Draw (context-dependent) |
The matrix tells a consistent story: star schema wins at scale and governance, OBT wins at speed and simplicity for narrow use cases. The companies that get in trouble with OBT are the ones applying OBT's "speed and simplicity" benefits to a use case that requires star schema's "governance and correctness" properties.
How to Design the Source Schema to Make OBT Safe
The fan trap in the marketing attribution OBT wasn't inevitable. It was the result of a source schema that made cross-grain joins look natural because the foreign keys were there. The fix isn't always "switch to a star schema" — but the grain contract has to live somewhere an engineer will actually see it before they write the SUM.
Here's the same digital-marketing domain modeled with the OLTP source, the star schema, and the OBT side by side — with the danger zones annotated at the schema level, so the grain violation is visible before any query is written:
The grain contract has to live where engineers look
A grain rule in a Slack message from six months ago doesn't survive a new hire's first dashboard. Useful places to anchor it:
- On the ERD / DBML — a
DANGER: campaign-day grain. Do not SUM.comment on the column itself, so it shows up in DBeaver, the data catalog, and the dbt docs site. - In dbt
schema.yml— a model-level graindescription, aunique_key, and per-column descriptions on the unsummable fields. - In an
obt_audittable — one row per OBT recording declared grain plus the list offan_trap/cross_graincolumns, queryable by automated tests in CI.
When all three agree, the contract is harder to lose than to remember.
When you have too many DANGER columns, you've outgrown OBT
Count the unsummable columns in your OBT. Two or three is manageable — the contract above keeps them honest. Five, ten, fifteen? You're not maintaining an OBT anymore; you're hand-rolling a broken semantic layer.
That's the graduation signal. The fix is to stop adding DANGER notes and move grain logic into a tool built for it:
- MetricFlow (dbt Cloud) — metrics defined once, joined at query time with grain awareness
- Cube — same idea, runs anywhere
- LookML (Looker) — semantic model with explicit join paths
The deeper point: star schemas make grain explicit and enforced; OBTs make grain implicit — a YAML comment that an analyst will forget in a quarter.
Make grain a design-time conversation with TalkingSchema
TalkingSchema treats grain as a first-class part of schema review, not a postmortem artifact. The AI reads your DBML, flags cross-grain joins, annotates unsummable columns at the schema level, and offers a dimensional rewrite when an OBT is asking too much of one row — so the fan-trap conversation happens at design time, before the dashboard goes to the board.
Try it with prompts like:
Prompt: "Review this
obt.marketing_attributionschema for grain mismatches. Which columns cannot be safely aggregated with SUM? Which table should I use for spend instead?"
Prompt: "Show me the dimensional model equivalent of this marketing attribution OBT — and call out the grain problems the OBT was hiding."
Prompt: "Design a multi-touch attribution schema that avoids fan traps. Annotate every cross-grain column with a
DANGER:note."
Want to design the safe version of this OBT schema — or see what it looks like as a dimensional model? → Try TalkingSchema and see how the grain conversation changes the design.