TalkingSchema + dbt Workflow
TalkingSchema and dbt serve different but complementary roles in the analytics engineering pipeline. Understanding exactly where each tool fits — and what each tool does not do — is the starting point for using them together effectively.
TalkingSchema: Designs the data warehouse schema. Answers what tables should exist, how they should be structured, and what the relationships between them are. Generates ERDs, dimensional models, SQL DDL, and ORM exports from plain-language requirements.
dbt: Transforms data inside the warehouse. Answers how raw source data becomes analytics-ready mart data. Manages SQL SELECT model dependencies, runs data quality tests, and generates documentation.
Neither tool replaces the other. A data warehouse project needs both: a designed structure (TalkingSchema) and a transformation layer to populate it (dbt).
TalkingSchema does not run dbt, connect to your dbt project, or manage dbt configurations. The workflow described here is code-based: TalkingSchema generates SQL code blocks and schema definitions that you copy into your dbt project. A deeper integration (e.g., pushing models directly to a repository) is on the roadmap.
How the Tools Fit Together
Business Requirements
↓
[TalkingSchema — Design]
• Star schema ERD
• Fact and dimension table DDL
• dbt model SQL (via AI chat)
↓
[dbt — Transform]
• staging/ models
• intermediate/ models
• marts/ models
↓
Analytics-Ready Data Warehouse
The design work (TalkingSchema) must precede the transformation work (dbt) — you cannot write transformation models for tables that have not been designed. In practice, these two activities often happen iteratively: you design an initial mart layer structure in TalkingSchema, write initial dbt models, then return to TalkingSchema as requirements evolve.
Worked Example: E-Commerce Orders
This example walks through the full workflow for an e-commerce company's order analytics use case.
Step 1 — Design the data warehouse schema in TalkingSchema
Start by describing the business domain and desired analytical capability in TalkingSchema's chat interface:
Prompt:
"I'm building a data warehouse for an e-commerce company. The primary business process is order line item revenue analysis. Design a Kimball-style star schema with:
- A fact table at the order line item grain
- dim_date (conformed, with fiscal year support)
- dim_customers (SCD Type 2, track changes to city and customer_segment)
- dim_products (include product_category and brand hierarchy)
- dim_promotions (nullable — not every order has a promotion)
Use surrogate keys (SERIAL) on all dimension tables. Use BIGSERIAL on the fact table."
TalkingSchema generates the full star schema:
fct_order_line_items
CREATE TABLE fct_order_line_items (
order_line_sk BIGSERIAL PRIMARY KEY,
order_date_sk INTEGER NOT NULL REFERENCES dim_date(date_sk),
customer_sk INTEGER NOT NULL REFERENCES dim_customers(customer_sk),
product_sk INTEGER NOT NULL REFERENCES dim_products(product_sk),
promotion_sk INTEGER REFERENCES dim_promotions(promotion_sk),
-- Source keys (degenerate dimensions)
order_id VARCHAR(50) NOT NULL,
line_item_id VARCHAR(50) NOT NULL,
-- Measures
quantity INTEGER NOT NULL,
unit_price NUMERIC(10, 2) NOT NULL,
discount_amount NUMERIC(10, 2) NOT NULL DEFAULT 0,
extended_price NUMERIC(10, 2) NOT NULL,
-- Metadata
_loaded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
dim_customers (SCD Type 2)
CREATE TABLE dim_customers (
customer_sk SERIAL PRIMARY KEY,
customer_bk VARCHAR(50) NOT NULL,
full_name VARCHAR(255),
email VARCHAR(255),
city VARCHAR(100),
country VARCHAR(100),
customer_segment VARCHAR(50),
effective_date DATE NOT NULL,
expiry_date DATE,
is_current BOOLEAN NOT NULL DEFAULT TRUE,
row_version INTEGER NOT NULL DEFAULT 1
);
Apply the DDL to your warehouse (Snowflake, BigQuery, Redshift, PostgreSQL) to create the target tables.
Step 2 — Design the source staging layer
Before mart models can run, dbt needs staging models that clean and standardize raw source data. Use TalkingSchema's AI to understand your source schema:
Prompt:
"I have a raw orders table from my Shopify export. It has these columns:
id,created_at(string in ISO 8601),customer_id,customer_email,customer_city,product_id,product_title,sku,quantity(string),price(string),discount(string or null),promotion_code(nullable string). Design a clean staging layer schema for this data."
TalkingSchema generates a stg_orders staging table definition with correct data types and a clean column naming convention — before you write a single line of dbt SQL.
Step 3 — Generate dbt staging model SQL
With the source and target schemas understood, generate the dbt model SQL in TalkingSchema's chat:
Prompt:
"Generate a dbt staging model for the raw Shopify orders table described above. Requirements:
- Use snake_case column names
- Cast all columns explicitly to correct types (dates to DATE, prices to NUMERIC(10,2), quantity to INTEGER)
- Add a surrogate key using
dbt_utils.generate_surrogate_key(['id', 'line_item_id'])- Add a
_loaded_atcolumn withcurrent_timestamp- Handle the NULL discount column with COALESCE to 0
- Source name: shopify, identifier: raw_orders"
TalkingSchema returns a complete SQL model in a code block:
-- models/staging/stg_orders.sql
with source as (
select * from {{ source('shopify', 'raw_orders') }}
),
renamed as (
select
id::varchar as order_id,
created_at::timestamp as created_at,
created_at::date as order_date,
customer_id::varchar as customer_bk,
customer_email::varchar as customer_email,
lower(trim(customer_city))::varchar as customer_city,
product_id::varchar as product_bk,
product_title::varchar as product_name,
sku::varchar as product_sku,
quantity::integer as quantity,
price::numeric(10, 2) as unit_price,
coalesce(nullif(discount, '')::numeric(10, 2), 0.00) as discount_amount,
(price::numeric(10, 2) * quantity::integer)
- coalesce(nullif(discount, '')::numeric(10, 2), 0.00) as extended_price,
promotion_code::varchar as promotion_code,
{{ dbt_utils.generate_surrogate_key(['id']) }} as order_sk,
current_timestamp as _loaded_at
from source
)
select * from renamed
Copy this into models/staging/stg_orders.sql in your dbt project.
Step 4 — Generate the mart layer dbt models
With staging models in place, generate the fact table population model:
Prompt:
"Generate a dbt mart model for fct_order_line_items that joins stg_orders to the dimension tables using surrogate key lookups. The fact table grain is one row per order. Include left joins to dim_promotions (nullable). Use
ref()for all upstream model references andsource()for raw tables."
-- models/marts/fct_order_line_items.sql
with orders as (
select * from {{ ref('stg_orders') }}
),
dim_date as (
select * from {{ ref('dim_date') }}
),
dim_customers as (
select * from {{ ref('dim_customers') }}
where is_current = true
),
dim_products as (
select * from {{ ref('dim_products') }}
),
dim_promotions as (
select * from {{ ref('dim_promotions') }}
)
select
{{ dbt_utils.generate_surrogate_key(['o.order_id']) }} as order_line_sk,
d.date_sk as order_date_sk,
c.customer_sk,
p.product_sk,
promo.promotion_sk,
o.order_id,
o.quantity,
o.unit_price,
o.discount_amount,
o.extended_price,
current_timestamp as _loaded_at
from orders o
left join dim_date d on o.order_date = d.calendar_date
left join dim_customers c on o.customer_bk = c.customer_bk
left join dim_products p on o.product_bk = p.product_bk
left join dim_promotions promo on o.promotion_code = promo.promotion_code