dbt-tpch
dbt-tpch copied to clipboard
dbt sample project for Snowflake using the `TPCH` dataset that ships as a shared database with Snowflake.
dbt TPC-H Sample
This is a dbt sample project for Snowflake using the TPC-H example dataset that ships as a shared database with Snowflake.
More details can be found on the TPC website and in the specification document.
The project is laid out as follows:
_schemacontains schema defintions and tests in one.ymlfile per model_sourcecontains source definitionsbasecontainsephemeralbase models that serve as wrappers around source models to define column names and data types where necessaryodsrepresents an Operational Data Store (ODS), i.e. a mostly normalized view of the data. These models may contain more columns than we may choose to publish to the dimensional data warehouse, but don't contain any reporting models.whrepresents the Dimensional Data Warehouse (WH). These models use a star schema methodology made up of fact (fct_*) and dimension (dim_*) tables. In addition, this schema contains report models (rpt_*) that combine fact and dimension tables for business reporting.
Profile
- Add a new profile to
~/.dbt/profiles.ymlcalledtpch.
tpch:
target: dev
outputs:
prod:
type: snowflake
threads: 8
account: <account>
user: <user>
password: <password>
role: <role>
database: <target_database>
warehouse: <snowflake_warehouse>
schema: <default_schema>
dev:
type: snowflake
threads: 8
account: <account>
user: <user>
password: <password>
role: <role>
database: <target_database>
warehouse: <snowflake_warehouse>
schema: <default_schema>
Packages
This project make use of the dbt_utils package, so you will need to call dbt deps before running any model to ensure dbt can combile all package macros.
Scaling Factor
Also, note that you can change the scaling factor of the TPCH dataset by switching the source database in _source/source_tpch.yml from the default of 10 to 100 or 1000 by changing the database name accordingly.
version: 2
sources:
- name: tpch
database: SNOWFLAKE_SAMPLE_DATA
schema: TPCH_SF10
loader: Snowflake
...
Snowflake Usage
Using an X-Small warehouse (1 credit / hour), the project currently runs in about 5 minutes against the TPCH_SF10 database.
- What is dbt?
- Read the dbt viewpoint
- Installation
- Join the chat on Slack for live questions and support.