Skip to main content

Kimball vs Inmon vs Data Vault 2.0: Data Warehouse Architecture Guide

Every data team eventually walks into the same room.

Someone is convinced Kimball is dead and cloud storage fixes everything. Someone else is just as convinced that Inmon is the only serious enterprise architecture. A third voice slides in with "Data Vault 2.0 — hubs, links, satellites, audit-ready, future-proof." Forty minutes later nothing has been decided, six weeks later nothing has been built, and the dashboards the business asked for in Q1 are now a Q3 problem.

Here's my promise: this post will help you pick an approach the way an architect picks a structure—not like a sports fan picking a jersey.

The Missing Layer in ERD Tools: Why Database Schemas Need Business Context

If you've read our companion piece on ERD tools in 2026, you've seen the seven categories the market currently breaks into — developer tools, enterprise governance platforms, diagramming surfaces, database clients, warehouse modelers, AI-native design tools, schema-as-code. Each is optimized for a different job.

What none of them solve — yet — is the job that matters most as schemas evolve over years: keeping the meaning of a model accurate as the business it describes changes.

This is the piece where we make the case that the next real category in this space isn't a better diagram, a better DBML editor, or a better AI Agent. It's a context-aware modeling system. And it's the bet TalkingSchema is taking.

Database Normalization Is Not Enough: Context-Aware OLTP Schema Design

The data is right there. The meaning is not.

That gap — between what your OLTP schema stores and what it means — has quietly become the most expensive problem in modern data work. The textbook half is solved. Tables normalize to 3NF, foreign keys check out, migrations ship without locking, and most senior engineers can stamp out a clean transactional schema in their sleep.

The unsolved half is everything the schema doesn't say. The intent behind a column. The grain of a row. The business definition of an enum value. The mutability of a foreign key target. None of it lives in the schema, the catalog, or the data dictionary — and the cost of that silence falls almost entirely on the teams downstream: analysts reverse-engineering column meanings from old Slack threads, ML engineers guessing at grain, executives squinting at dashboards quietly compromised by an undocumented default.

We're not alone in calling this out. Scroll through what the rest of the industry has been writing about it:

Star Schema vs Snowflake Schema: When to Use Each, with dbt and BI Tool Guidance

In one corner: “Star schema. Always. Denormalize everything. Kimball said so.”

In the other: “Snowflake schema is cleaner. More normalized. Your dimension tables won’t bloat. Third normal form is not a dirty word.”

The problem is that the “star vs. snowflake” debate happens eight months later, when the dashboards are already broken.

Zero-Downtime Schema Migrations: The Expand-Contract Pattern for PostgreSQL

A schema migration rarely looks scary when you first write it.

Add a column. Rename a field. Split some metadata into a cleaner table. Add an index so the new dashboard stops timing out. In a pull request, each of these can look like one tidy line of SQL.

Production sees something else — a table with millions of rows, old application code still running beside new code, background jobs writing during the deploy, dashboards reading every few seconds, and one migration trying to change the floor while everyone is still walking on it.

That is why zero-downtime migrations are not really a SQL problem. They are a coordination problem.

And coordination is exactly where most migration workflows are weakest. The plan lives in a Slack thread. The risk is explained in a comment. The rollout sequence is in someone's head. The schema diagram shows the destination, but not the journey.

That's the whole point of this article: not to turn you into a database lock encyclopedia, but to show how a team can make risky schema changes boring on purpose.

One Big Table (OBT) Pattern: Fan Traps, Grain Violations, and When Denormalization Actually Works

In Q3 of last year, a B2B SaaS company's growth team presented their quarterly board report. Marketing ROAS: 4.2x across all paid channels. LinkedIn ROAS: 6.1x. Google Ads: 3.8x. The board was delighted. Budget was reallocated toward LinkedIn. A senior marketer got a raise.

Eight months later, a new data engineer joined the team and did something nobody had done before: she reconciled the attribution dashboard against Stripe. Total attributed conversion value in the dashboard: $2.4 million. Actual revenue in Stripe for the same cohort: $880,000.

The discrepancy wasn't fraud. It wasn't a bug in Stripe or a misunderstanding of the attribution window. It was a fan trap, baked into the foundation of the marketing attribution One Big Table that had been powering every board report for three quarters.

Best ERD and Database Design Tools 2026: dbdiagram, ChartDB, erwin, SqlDBM, TalkingSchema Compared

An opinionated comparison from the team building TalkingSchema.

Every "best ERD tools" article follows the same template: ten tools, a feature matrix with checkboxes, a verdict that carefully avoids offending any vendor, and a call-to-action to "try them all and see what works for you." They are useless.

This post does something different. It starts by naming the categories — because the reason most ERD comparisons mislead you is not that they picked the wrong tools. It's that they treat all ERD tools as if they're solving the same problem.

Generate OpenAPI, GraphQL, SQLAlchemy, and TypeORM from Your Database Schema

Your PostgreSQL column says subscriber_count_cache. Your OpenAPI spec says subscriber_count. Your GraphQL type says subscriberCount. Your TypeORM entity says subscriberCountCache. Your SQLAlchemy model says subscriber_count.

Which one is right? All five. And that's the problem.

This is schema drift — the slow, silent divergence between your database, your API contract, and your ORM layer that starts the moment these representations are maintained separately. It doesn't happen because engineers are careless. It happens because there's no structural reason they should stay synchronized. Each representation lives in a different file, edited by a different person, on a different cadence. Eighteen months in, you have five artifacts that all claim to describe the same data model and agree on nothing.

The fix isn't more discipline. It isn't a code generator wired into your CI. It's a completely different workflow:

Design your database schema once in TalkingSchema. Then talk to your schema to generate every other representation — OpenAPI, GraphQL, SQLAlchemy, TypeORM — from the same canonical model, on demand.

That's the whole post. The rest of it just shows what that looks like.

Database Schema to Prisma, Drizzle, and Zod: Design First, Generate Everything

A user submits a form. Your TypeScript compiles clean. Your Zod validator passes. Your tRPC procedure returns a 200. And somewhere in the response, a column that was supposed to be Date | null arrives as an ISO string — and your frontend renders NaN where a timestamp should be.

You've seen this bug. You've shipped this bug. The type system didn't catch it because the type was never derived from the database in the first place — it was written by hand, approximately correct, and silently wrong.

This is the source-of-truth problem. TypeScript doesn't solve it. Where you define your data model does.

OLTP vs OLAP: Schema Design, CDC, DuckDB, and the Decision Framework

It was a Tuesday afternoon when the CTO sent the Slack message that every data engineer dreads:

"The claims dashboard has been loading for 47 seconds. Our customers are complaining. What happened?"

What happened was straightforward. A data analyst ran a reasonable aggregation — GROUP BY claim_type, EXTRACT(year FROM incident_date) — against production PostgreSQL: the same cluster that was already handling ~3,200 concurrent insurance claims from adjusters across eight states.

The query locked an index, response times spiked, and on-call was paged at 2 AM.

What failed wasn't the schema or the SQL. It was the assumption: that a system tuned for thousands of short, correct transactions per minute could also scan years of history for analytics without hurting live operations.

That is the OLTP vs OLAP problem in one outage. It is not abstract — it costs engineering hours, customer trust, and real infrastructure budget — and it usually begins with a schema decision from before analytics was on the roadmap.

Slowly Changing Dimensions (SCD) Type 1, 2, 6: Make the Decision in Your Schema, Not Your ETL

ETL is the bullied kid of the data stack. Dashboard wrong? Blame the ETL engineer. Patient count drops 8% on a Monday? ETL. Cost-sharing numbers shift overnight and the CFO wants answers by 9 AM? ETL again. The Slack thread always lands on the same person — who almost certainly isn't the one who introduced the lie.

Let's look at one of those lies. A patient was reclassified from a Bronze insurance tier to a Gold tier in March, but the claims report has been grouping her Bronze claims under the Gold bucket ever since. The ETL job that loaded the new tier overwrote the old value. Every historical encounter that happened under Bronze coverage now looks like Gold.

For context: the healthcare warehouse canvas below shows the source systems and warehouse schema we're working with.

This is what a missing SCD Type 2 decision looks like in production. It's almost never caught by data quality tests, because every row is individually valid. The lie is in the history, not the data. And it didn't originate in the ETL code where it eventually surfaces — it was introduced six months earlier, at the schema-design layer, when someone made an SCD typing decision (or silently failed to) and never wrote it down anywhere outside of a MERGE statement.

I've been designing data warehouse dimension layers long enough to have strong opinions here. The SCD taxonomy is one of the most decision-dense areas of dimensional modeling — eight types with overlapping tradeoffs, no universal agreement on when to use what, and performance consequences that bite teams six to twelve months after they think they've done it right. This post is my attempt to cut through all of it.

Database Schema Design Patterns: Polymorphism, Hierarchies, Audit Logs, Soft Deletes, and SCD

Schema design isn't just about getting data to persist. It's about modeling the shape of your domain — the things that exist, the relationships between them, and the constraints that keep it all consistent. A good schema makes queries fast, business logic obvious, and migrations safe. A bad one makes all three miserable.

These are five patterns that appear in almost every non-trivial system, with notes on when to reach for each one.

AI-First Database Design: Build Data Models from Natural Language

For decades, designing a database meant one of two things: writing SQL DDL by hand, or dragging boxes around a diagramming tool. Both approaches force you to think in the machine's language before you've finished thinking in your own.

TalkingSchema is built on a different premise: you should be able to describe your data model the way you'd explain it to a colleague, and get a working, exportable schema out the other side.

Export Database Schema to SQL, Prisma, Drizzle, OpenAPI, GraphQL, and More

A common frustration in backend development: you've designed a schema in one context (a diagramming tool, a migration file, a Prisma schema), and now you need it in a different format for a different purpose. You end up maintaining multiple representations of the same model — and keeping them in sync manually.

TalkingSchema takes a different approach. The schema is defined once. Any export format is generated from that single source of truth, on demand, without touching the original.

Share Database Schema Diagrams with Anyone: Public Links and Iframe Embeds

Database schemas are collaborative artifacts. They're discussed in PR reviews, included in RFCs, pasted into architecture docs, and explained to stakeholders who've never written a line of SQL. But getting a non-engineer to open a migration file and understand it is asking a lot.

Public threads solve this with a shareable link to a read-only, interactive version of your schema — no account, no installation, no friction.