[CT-3467] add environment variable / flag to exclude certain resource types
Is this your first time submitting a feature request?
- [X] I have read the expectations for open source contributors
- [X] I have searched the existing issues, and I could not find an existing issue for this feature
- [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
We expect folks to want to set environment defaults for whether or not unit tests should be included in dbt build/dbt test:
| environment | include unit tests by default |
|---|---|
| prod | no |
| non-prod | maybe? |
| dev | yes |
| CI | yes |
Folks have a similar need to exclude other resource types from certain environments ("don't run my snapshots in dev", etc.).
In order to support this exclusion for all resource types...
Acceptance Criteria (updated from discussion below)
-
Create a new flag ->
DBT_EXCLUDE_RESOURCE_TYPE/--exclude-resource-type/ etc.- takes last cut after we've done existing selection (
--selectand--exclude) --exclude-resource-typeis mutually exclusive with--resource-type- default -> don't exclude ANY resource types
- include/exclude influences what's selected in your commands, what shows up in your logs, etc; however, all resources should still be included in the manifest artifact
- For consistency, we should have this functionality able to be set in all 3 places: env var, project flags, and CLI flag. Related issue -> https://github.com/dbt-labs/dbt-core/issues/7036
- takes last cut after we've done existing selection (
-
Ideally, we'd also expand
--resource-typeto also be an "include resource types" environment variable, flag, etc (but that can be out of scope of this issue if needed)
Describe alternatives you've considered
Alternative 1 -> Always remember to specify --select test_type:...
- annoying :(
Alternative 2 -> Setting enabled equal to an environment variable for all unit tests:
...
unit_tests:
my_project:
+enabled: {{env_var('DBT_INCLUDE_UNIT_TESTS', true)}}
- all unit tests would not be included in our metadata artifacts for the environments where
DBT_INCLUDE_UNIT_TESTS = false:(
This makes sense to add a method to configure this environment-wide 'cause always needing to remember to add --select test_type:... could get tedious.
Options
It seems like there are two main options for environment variable(s):
- Single multi-choice environment variable:
INCLUDE_TEST_TYPES- one or more test types (i.e.,data unit)
- Multiple boolean environment variables:
DBT_INCLUDE_UNIT_TEST- select (or de-select) unit tests (possibly with associated CLI flag of--include-unit-test/--no-include-unit-test)DBT_INCLUDE_DATA_TEST- select (or de-select) data tests- etc.
Comparison
Both options would allow for many different test types. The former would only need a single environment variable and the latter would need to add a new environment variable along with each new test type.
Similarly, both options could also be included in UserConfig (if we choose).
The latter option would be a bit easier to flip a single type on/off with either an environment variable or a CLI flag.
Option 1: INCLUDE_TEST_TYPES - default is data, unit
| environment | value |
|---|---|
| prod | ["data"] |
| ci | ["data", "unit"] |
| dev | ["data", "unit"] |
Option 2: TEST_TYPES_SELECTION - default is {include: all}
| environment | value |
|---|---|
| prod | {"include": ["data"]} OR {"exclude": ["unit"]} |
| ci | {"include": "all"} OR {"include": ["data", "unit"]} |
| dev | {"include": "all"} OR {"include": ["data", "unit"]} |
Option 3: INCLUDE_UNIT_TESTS - default is true
| environment | value |
|---|---|
| prod | false |
| ci | true |
| dev | true |
Note: If you explicitly passed --select test_type:..., we would use the passed-in selector RATHER than the environment variable. We can think of the environment variable as being the "default" for the environment.
^^ is this actually possible?
Our thoughts:
- you set this once, then forget about it
- option 3 is the easiest to set "just set it to be false in prod"
- option 2 is most similar to
WARN_ERROR_OPTIONS - option 2 is a big gross (curly brackets! lists! oh my!)
- if we go for option 3, would we need an equivalent environment variable for every test type
INCLUDE_DATA_TEST(would you ever want to exclude your data tests??)
HYPOTHESIS -> We think it's likely that there will never be more than 5 test types, maybe there's only 2 test types forever. Thus, we think option 3 is best, and options 1 & 2 are probably over-engineered given we're not going to have tons of test types.
Setting it in an environment variable is inconsistent from how we do this for other aspects. Historically we have set it in all 3.
As discussed with @gshank and @dbeatty10.... For consistency, we should have this functionality able to be set in all 3 places: env var, project flags, and CLI flag. Related issue -> https://github.com/dbt-labs/dbt-core/issues/7036
Similar to the include_saved_query flag https://github.com/dbt-labs/dbt-core/blob/c12f6fbf4db774356bd1472045ef4e1873969b08/core/dbt/cli/params.py#L409
Alternatively we could add a new environment variable for DBT_RESOURCE_TYPE (since we already have the --resource-type CLI option)
Proposed new acceptance criteria to expand this solution to all resource types you may want to exclude from certain environments
-
Create a new flag ->
DBT_EXCLUDE_RESOURCE_TYPE/--exclude-resource-type/ etc.- takes last cut after we've done existing selection (
--selectand--exclude) --exclude-resource-typeis mutually exclusive with--resource-type- default -> don't exclude ANY resource types
- include/exclude influences what's selected in your commands, what shows up in your logs, etc; however, all resources should still be included in the manifest artifact
- takes last cut after we've done existing selection (
-
Ideally, we'd also expand
--resource-typeto also be an "include resource types" environment variable, flag, etc (but that can be out of scope of this issue if needed) -
Rename
include_saved_querytoexport_saved_query- keep old name for backwards compatibility
- always include saved queries, but they don't do anything unless you set
export_saved_querytotrue- this is not about excluding/including saved queries, but about whether or not the exports are run - when
export_saved_queryis false, you will see the saved queries in your log but explicitly called out as a no-op (this feels less hidden)
This will allow folks to configure:
- exclude my unit tests in prod
DBT_EXCLUDE_RESOURCE_TYPE = ['unit_test'] - don't export your saved queries in any env except prod
DBT_EXPORT_SAVED_QUERY = false - exclude snapshots in dev
DBT_EXCLUDE_RESOURCE_TYPE= ['snapshot'] - etc.
cc: @jtcohen6 @dbeatty10 @Jstein77
Would be great to add docs here on best practices for what to set this environment variable to (for ci, prod, etc.)
After various conversations, we decided that it didn't make a lot of sense to have this be settable in project flags, since it's really command specific, not project specific. So we're going to implement env vars and cli params. This applies to three commands: build, list, and clone.
Does this not apply for run or test?
No. The only commands that use this are those three.
I guess you could change that... but that would be a different ticket, I think.
Hello! How do you actually achieve this:
exclude snapshots in dev DBT_EXCLUDE_RESOURCE_TYPE= ['snapshot']
we also want to exclude them but couldn't figure out how other than maybe adding enabled: "{{ var('is_prod', false) | as_bool }}" or something in the config of snapshots.
You can use the --exclude-resource-type flag (docs here) or the associated environment variable DBT_EXCLUDE_RESOURCE_TYPES (see here).
How do you use it so it so it applies it only when target == prod? It's the exclusion depending on the target that I couldn't figure out.
You can set the environment variable equal to something different in each of your environments. Here's some docs on how this works in dbt Cloud.
In our case the environment doesn't change, only the target. So it'll go in the line of setting the exclude resource type variable depending on the target using jinja in the dbt_project.yml. Thanks for the pointers!