Grain (Data Warehouse)
Grain is the precise definition of what one row in a fact table represents. It is the single most important decision in dimensional modeling — every other design decision (which dimensions to include, which measures to add, how to handle aggregation) depends on the grain being correctly and unambiguously declared.
Ralph Kimball, whose Kimball Methodology is the dominant framework for dimensional modeling, defines the grain as the answer to this question:
"What does one row in this fact table represent?"
The answer must be specific, business-meaningful, and agreed upon before any other design work begins.
Why Grain Matters
The grain determines:
- What measurements are possible. A fact table at the order line item grain can measure revenue per product per order. A fact table at the daily summary grain cannot measure individual orders — that detail has been aggregated away.
- Which dimensions can be attached. Dimensions must apply to every row at the declared grain. If the grain is "one order line item," then
dim_productsapplies naturally. If the grain were "one monthly customer summary," attachingdim_productswould be problematic because a customer buys many products in a month. - How aggregation behaves. Analysts querying a fact table at the wrong grain will produce incorrect results — double-counting, incorrect averages, or missing detail.
- Data volume. A finer grain (one line item per order) produces more rows than a coarser grain (one order per day). This affects storage, query performance, and partitioning strategy.
Kimball's Three Steps for Dimensional Modeling
Ralph Kimball defines three mandatory steps that must be completed in order for every fact table:
Step 1 — Identify the business process
Choose the specific business process you are modeling. Business processes are activities your organization performs: taking orders, processing payments, shipping packages, recording page views. Each business process typically becomes its own fact table.
Example: The business process is sales order fulfillment.
Step 2 — Declare the grain
State precisely what one row represents. The grain declaration should be a single, unambiguous sentence. Common formulations:
- "One row per sales order line item"
- "One row per customer session"
- "One row per daily inventory balance per warehouse"
- "One row per hotel room-night booking"
Example: "One row per sales order line item."
Step 3 — Identify the dimensions
With the grain declared, list all the descriptive dimensions that apply to every row at that grain. Each dimension must be meaningful at the declared grain — it cannot sometimes apply and sometimes not.
Example at line item grain: dim_date (order date), dim_customer (who placed the order), dim_product (what was ordered), dim_store (where the order was placed), dim_promotion (what promotion applied, if any).
Grain Examples
Fine-grain example: Sales order line item
| Grain declaration | "One row per sales order line item" |
|---|---|
| Fact table | fct_order_line_items |
| Natural key | order_id + line_item_id |
| Measures | quantity, unit_price, extended_price, discount_amount |
| Dimensions | dim_date, dim_customer, dim_product, dim_store, dim_promotion |
This is a fine-grain fact table. Every individual product line on every order gets its own row. You can answer questions like: "What was the revenue from Product X sold in Store Y in Q3, after applying Promotion Z?"
Coarse-grain example: Daily sales summary
| Grain declaration | "One row per product per store per day" |
|---|---|
| Fact table | fct_daily_sales_summary |
| Natural key | product_id + store_id + date |
| Measures | total_quantity_sold, total_revenue, transaction_count |
| Dimensions | dim_date, dim_product, dim_store |
This is a coarse-grain fact table. Individual orders are invisible — only daily totals per product/store are stored. You cannot answer "which customer bought Product X on Day Y" because customer information is not meaningful at this grain.
Common Grain Mistakes
Mixing grains in a single fact table
The most dangerous grain mistake is mixing rows with different grains in the same fact table. For example:
fct_orders (PROBLEMATIC):
- Row type A: one row per order header (measures: order_total, tax, shipping)
- Row type B: one row per order line item (measures: quantity, unit_price)
This is a mixed-grain fact table. Some rows represent headers; others represent line items. Queries that sum quantity will include rows where quantity is NULL; joins to dim_product will be nonsensical for header rows. Aggregations will produce incorrect results.
Solution: Use separate fact tables for different grains. An fct_order_headers table for order-level measures and an fct_order_line_items table for line-item measures. They can share conformed dimensions.
Declaring grain too broadly
A grain of "one row per order" hides line-item detail. If analysts later need to analyze revenue by product, they cannot do so from a header-grain fact table. Err toward the finest grain that the source system supports. You can always aggregate up; you cannot disaggregate down.
Conflating grain with a key
The grain is a business-meaningful statement, not a technical description of the primary key. "One row per order_line_id" is technically accurate but tells you nothing about what order_line_id represents in business terms. Always state the grain in business language.
Declaring Grain in TalkingSchema
When designing a star schema with TalkingSchema's AI copilot, state the grain explicitly in your prompt. This produces better-structured fact tables.
Example prompt:
"Design a star schema for retail sales analytics. The business process is point-of-sale transactions. The grain is one row per sales transaction line item — meaning one product on one receipt at one register at one time. Include dimensions for date, customer, product, store, and promotion."
TalkingSchema will generate a fact table with the correct grain-level primary key, appropriate measures (quantity, unit price, extended price, discount), and properly connected dimension tables.
Checking grain in an existing schema:
"Review the fct_orders table. Is the grain consistent? Are there any mixed-grain indicators — for example, columns that apply to the order header but not to individual line items?"
Frequently Asked Questions
Can a fact table have multiple grains?
No. A fact table must have exactly one grain. If you need to store data at two different grains (e.g., order headers and order line items), use two separate fact tables. They can share conformed dimensions like dim_date, dim_customer, and dim_store.
What is the difference between grain and granularity?
In dimensional modeling, these terms are used interchangeably. "Grain" is Kimball's preferred term. Some practitioners use "granularity" to mean the same thing — the level of detail represented by each row. The important thing is to declare it precisely, not which word you use.
Should I always choose the finest possible grain?
Generally yes. Kimball's recommendation is to declare the grain at the most atomic (finest) level that the source system supports. Finer-grain fact tables can answer more questions. Coarser-grain summary fact tables can be added as performance aggregates on top of the atomic fact table — but the atomic table should exist as the foundation.
How does grain relate to slowly changing dimensions (SCDs)?
The grain affects how SCDs are joined to fact tables. If a dimension row changes (e.g., a customer moves to a different city), the SCD Type 2 record captures the historical version. The fact table row's foreign key to the dimension should point to the dimension row that was valid at the time the fact event occurred. This is why surrogate keys (not natural keys) are used in fact-to-dimension joins in dimensional modeling.