dbt-sugar
dbt-sugar copied to clipboard
Leverage "Column Names as Contracts" ideas to (auto)generate column tests
Noel, one of our community members posted an article from Emily (Capital One) on our discord: https://discord.com/channels/752101657218908281/789211881117057064/807763141526028338
Which refers to the idea of using column names as contracts:
- https://emilyriederer.netlify.app/post/column-name-contracts/
- https://emilyriederer.netlify.app/post/convo-dbt/
The TLDR of these articles are that if you use strong column naming conventions such as: ID
, DT
, CAT
you can directly from the column name have some assumptions such as something may have certain values to expect (e.g for categorical columns main you have a finite amount or terms, IDs should be unique and not null, dates should be within a certain range etc.)
If this is something people choose to use in their data models, we should be able to auto generate tests for them.
Probably the easiest and most explicit thing to do in a first version is to give users a way to configure a series of mappings in a yaml document which could look like the following:
- auto_tests:
- pattern: '^ID_'
test_rules: ['unique', 'not_null']
- pattern: '^CAT_'
test_rules: ['allowable_values']
test_arguments: ['cat_a', 'cat_b']
dbt-sugar could then "evalute" (in a safe manner) these configs and propose tests based on those column conventions.
A similar start of an idea is in discussio in #24 in which we propose the idea that if a column is named as the primary key in the {{config()}}
of a model that we could automatically propose a ['unique', 'not_null']
test on this column to the user as they document things.
This could also lead to the birth of a new dbt-sugar task which could be called auto-tests
(or we'll find something sweeter later) which would iterate over all of the models of a dbt-project and add tests based on those user rules.
Later, later, we could offer a way for people to share their user pattern/configs so that people could maybe install "packages" of configs into their dbt-sugar projects.
I like the idea, but I am not sure the CAT_ test_arguments would work as every column this applies to may have different values.
I also wonder if calling this auto_tests is broad enough vs calling it something more generic like model_vocabulary and thinking about level 1, 2, and 3.
This way the vocabulary can be used to test that attribute names follow the convention, auto generate starting yml with column descriptions, and auto create the tests as suggested.