Slowly Changing Dimension (SCD)
For a deeper take on where SCD typing decisions actually belong — and why making them in the schema beats making them in your ETL pipeline — see Slowly Changing Dimensions: The Decision Belongs in Your Schema, Not Your ETL Code
A slowly changing dimension (SCD) is a dimension table where the attribute values change over time, but infrequently enough that the changes require a deliberate design strategy rather than simple overwrite.
The canonical examples:
- A customer moves from New York to Boston — their
cityattribute changes - A product is reclassified from "Electronics" to "Accessories" — its
categoryattribute changes - An employee is promoted — their
job_titleanddepartmentchange
The design question is: how do you want the data warehouse to treat these historical changes? Should a customer's historical sales reflect their current address or the address they had at the time of purchase? Should last quarter's product revenue use the product's current category or the category it was in last quarter?
The answer depends on the business question you need to answer. SCD types provide standardized design patterns for each case.
SCD Type 0 — Retain Original
What it does: The attribute value is fixed at the time the record was first loaded and never updated, regardless of source system changes.
When to use it: Attributes that should reflect the original state, not the current state. Examples: a customer's original_signup_channel, a product's introduction_date, an employee's hire_date.
Implementation: No special columns needed — the row is never updated.
SCD Type 1 — Overwrite
What it does: When an attribute changes, the old value is overwritten with the new value. No history is kept.
When to use it: When historical accuracy is not required for this attribute and you always want the current value. Examples: correcting a data entry error, updating a phone number where the old value was never correct.
Implementation:
UPDATE dim_customers
SET city = 'Boston'
WHERE customer_bk = 'CRM-5542';
Trade-off: Historical fact rows that pointed to this customer will now show "Boston" for all time — even orders placed when the customer lived in New York. If you re-run historical reports, they will show different results than when they were originally run. This is acceptable only when historical accuracy of this attribute is not needed.
SCD Type 2 — Add New Row (Full History)
What it does: When an attribute changes, the current row is "expired" (marked inactive with an end date) and a new row is inserted with the updated attribute values and a new surrogate key.
When to use it: When you need to accurately represent the attribute value at the time a historical transaction occurred. This is the most common and most important SCD type. Examples: customer address, customer segment, product category, employee department.
Implementation:
The dim_customers table for SCD Type 2:
CREATE TABLE dim_customers (
customer_sk SERIAL PRIMARY KEY,
customer_bk VARCHAR(50) NOT NULL, -- natural key from source system
full_name VARCHAR(255),
city VARCHAR(100),
country VARCHAR(100),
customer_segment VARCHAR(50),
effective_date DATE NOT NULL,
expiry_date DATE, -- NULL means "currently active"
is_current BOOLEAN NOT NULL DEFAULT TRUE,
row_version INTEGER NOT NULL DEFAULT 1
);
When a customer moves from New York to Boston:
-- Step 1: Expire the current row
UPDATE dim_customers
SET expiry_date = '2024-06-14',
is_current = FALSE
WHERE customer_bk = 'CRM-5542'
AND is_current = TRUE;
-- Step 2: Insert the new version
INSERT INTO dim_customers
(customer_bk, full_name, city, country, customer_segment,
effective_date, expiry_date, is_current, row_version)
VALUES
('CRM-5542', 'Jane Smith', 'Boston', 'USA', 'Gold',
'2024-06-15', NULL, TRUE, 2);
Result:
| customer_sk | customer_bk | city | effective_date | expiry_date | is_current | row_version |
|---|---|---|---|---|---|---|
| 1001 | CRM-5542 | New York | 2022-03-01 | 2024-06-14 | FALSE | 1 |
| 1087 | CRM-5542 | Boston | 2024-06-15 | NULL | TRUE | 2 |
Orders placed before June 15, 2024 point to customer_sk = 1001 (New York). Orders placed after June 15, 2024 point to customer_sk = 1087 (Boston). Historical reports are accurate.
ETL lookup for loading fact rows: When loading a new order, look up the current surrogate key:
SELECT customer_sk
FROM dim_customers
WHERE customer_bk = 'CRM-5542'
AND is_current = TRUE;
Trade-off: Query complexity increases slightly. Fact table rows cannot be re-pointed to a different dimension row — the historical snapshot is permanent. ETL must correctly manage the expire/insert cycle.
SCD Type 3 — Add Previous Value Column
What it does: Adds a previous_value column alongside the current_value column. When an attribute changes, the old current value moves to previous_value and the new value becomes current_value. Only one level of history is preserved.
When to use it: When you need limited history (one change back) and can accept losing any older history. Example: tracking the current and previous sales region of a salesperson.
Implementation:
ALTER TABLE dim_customers
ADD COLUMN previous_city VARCHAR(100),
ADD COLUMN city_change_date DATE;
-- On change:
UPDATE dim_customers
SET previous_city = city,
city = 'Boston',
city_change_date = '2024-06-15'
WHERE customer_bk = 'CRM-5542';
Trade-off: Only one historical value is retained. If the customer moves a second time, the first historical value is permanently lost. This is rarely the right choice for dimensions that change multiple times.
SCD Type 4 — Mini-Dimension (History Table)
What it does: The frequently changing attributes are split off into a separate "mini-dimension" table. The main dimension table retains only the stable attributes. The mini-dimension gets its own surrogate key and is versioned independently.
When to use it: When a subset of a dimension's attributes changes very frequently (e.g., customer purchasing behavior scores, credit ratings), and tracking every change in the main dimension table would bloat the row count.
Implementation:
-- Main dimension: stable attributes
CREATE TABLE dim_customers (
customer_sk SERIAL PRIMARY KEY,
customer_bk VARCHAR(50),
full_name VARCHAR(255),
signup_date DATE
);
-- Mini-dimension: frequently changing behavioral attributes
CREATE TABLE dim_customer_profile (
profile_sk SERIAL PRIMARY KEY,
income_band VARCHAR(20),
purchase_frequency VARCHAR(20),
credit_score_band VARCHAR(20),
effective_date DATE,
expiry_date DATE,
is_current BOOLEAN
);
SCD Type 6 — Hybrid (Types 1 + 2 + 3 Combined)
What it does: Combines Type 1 (overwrite current attributes), Type 2 (insert new row for history), and Type 3 (add previous value column) into a single design. Each version row has both a current_value column (always showing the latest value for easy querying) and a historical_value column (showing the value at the time of that version).
Why "Type 6": 1 + 2 + 3 = 6.
When to use it: When you need full historical accuracy (Type 2) but also want to enable simple queries against the current value without self-joining to find the is_current row (Type 1) and want one-level-back history on the same row (Type 3). More complex to maintain but more flexible to query.
Choosing the Right SCD Type
| Requirement | Recommended SCD Type |
|---|---|
| No history needed; always use current value | Type 1 |
| Full history required; attribute at time of transaction | Type 2 (most common) |
| Keep only one previous value; two states is sufficient | Type 3 |
| Very high-frequency changes; separate the volatile data | Type 4 |
| Full history + easy current-value queries | Type 6 |
| Attribute should never change after initial load | Type 0 |
In practice, SCD Type 2 is the default for most dimension attributes that change over time. Types 0, 1, and 2 cover the vast majority of real-world cases.
Generating an SCD Type 2 Dimension in TalkingSchema
Example prompt:
"Design a dim_customers table with SCD Type 2 history tracking. Track changes to city, country, and customer_segment. Use a surrogate key, preserve the natural key from the CRM, and add standard SCD Type 2 fields: effective_date, expiry_date, is_current, and row_version."
TalkingSchema will generate the full table definition with all required SCD Type 2 columns, proper indexing on customer_bk and is_current for ETL lookups, and a comment on the ETL pattern.
Adding SCD Type 2 to an existing dimension:
"Update the dim_customers table to add SCD Type 2 support. The table currently has a simple primary key on customer_id. Add effective_date, expiry_date, and is_current columns, and generate a migration script."
Frequently Asked Questions
Why is SCD Type 2 the most commonly used type?
SCD Type 2 is the only type that preserves the full attribute history needed for accurate historical reporting. If a customer was in segment "Gold" during Q3 and is now in segment "Bronze," SCD Type 2 ensures that Q3 revenue reports still show Gold-segment customer revenue — which is the business truth at that point in time. No other SCD type provides this guarantee without data loss.
Does SCD Type 2 make queries more complex?
Slightly. To query current-state data, you add WHERE is_current = TRUE. To query point-in-time data, you filter on effective_date <= target_date AND (expiry_date > target_date OR expiry_date IS NULL). Most BI tools and dbt models handle this as a standard pattern.
What is the performance impact of SCD Type 2?
SCD Type 2 increases the row count of a dimension table proportional to how often attributes change. For dimensions where attributes change infrequently (a few times per year), the row count increase is modest and has negligible performance impact. For dimensions with very high change frequency, consider SCD Type 4 to separate the volatile attributes.
Can I implement SCD Type 2 with dbt?
Yes. The dbt_utils package provides a snapshot feature specifically for SCD Type 2 tracking. You can also manage it directly with dbt snapshot blocks in your dbt project. TalkingSchema can generate the schema definition; dbt manages the incremental snapshot loading logic.
How do I handle deletes in SCD Type 2?
When a source record is deleted (e.g., a customer account is closed), the typical approach is a soft delete: expire the current dimension row (set expiry_date and is_current = FALSE) and optionally add a is_deleted flag. Do not physically delete the dimension row — historical fact rows still reference that surrogate key and the join must remain valid.