dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[CT-3467] add environment variable / flag to exclude certain resource types

Open graciegoheen opened this issue 2 years ago • 6 comments

Is this your first time submitting a feature request?

  • [X] I have read the expectations for open source contributors
  • [X] I have searched the existing issues, and I could not find an existing issue for this feature
  • [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

We expect folks to want to set environment defaults for whether or not unit tests should be included in dbt build/dbt test:

environment include unit tests by default
prod no
non-prod maybe?
dev yes
CI yes

Folks have a similar need to exclude other resource types from certain environments ("don't run my snapshots in dev", etc.).

In order to support this exclusion for all resource types...

Acceptance Criteria (updated from discussion below)

  1. Create a new flag -> DBT_EXCLUDE_RESOURCE_TYPE / --exclude-resource-type / etc.

    • takes last cut after we've done existing selection (--select and --exclude)
    • --exclude-resource-type is mutually exclusive with --resource-type
    • default -> don't exclude ANY resource types
    • include/exclude influences what's selected in your commands, what shows up in your logs, etc; however, all resources should still be included in the manifest artifact
    • For consistency, we should have this functionality able to be set in all 3 places: env var, project flags, and CLI flag. Related issue -> https://github.com/dbt-labs/dbt-core/issues/7036
  2. Ideally, we'd also expand --resource-type to also be an "include resource types" environment variable, flag, etc (but that can be out of scope of this issue if needed)

Describe alternatives you've considered

Alternative 1 -> Always remember to specify --select test_type:...

  • annoying :(

Alternative 2 -> Setting enabled equal to an environment variable for all unit tests:

...
unit_tests:
	my_project:
		+enabled: {{env_var('DBT_INCLUDE_UNIT_TESTS', true)}}
  • all unit tests would not be included in our metadata artifacts for the environments where DBT_INCLUDE_UNIT_TESTS = false :(

graciegoheen avatar Dec 06 '23 17:12 graciegoheen

This makes sense to add a method to configure this environment-wide 'cause always needing to remember to add --select test_type:... could get tedious.

Options

It seems like there are two main options for environment variable(s):

  1. Single multi-choice environment variable:
    • INCLUDE_TEST_TYPES - one or more test types (i.e., data unit)
  2. Multiple boolean environment variables:
    • DBT_INCLUDE_UNIT_TEST - select (or de-select) unit tests (possibly with associated CLI flag of --include-unit-test / --no-include-unit-test)
    • DBT_INCLUDE_DATA_TEST - select (or de-select) data tests
    • etc.

Comparison

Both options would allow for many different test types. The former would only need a single environment variable and the latter would need to add a new environment variable along with each new test type.

Similarly, both options could also be included in UserConfig (if we choose).

The latter option would be a bit easier to flip a single type on/off with either an environment variable or a CLI flag.

dbeatty10 avatar Dec 06 '23 19:12 dbeatty10

Option 1: INCLUDE_TEST_TYPES - default is data, unit

environment value
prod ["data"]
ci ["data", "unit"]
dev ["data", "unit"]

Option 2: TEST_TYPES_SELECTION - default is {include: all}

environment value
prod {"include": ["data"]} OR {"exclude": ["unit"]}
ci {"include": "all"} OR {"include": ["data", "unit"]}
dev {"include": "all"} OR {"include": ["data", "unit"]}

Option 3: INCLUDE_UNIT_TESTS - default is true

environment value
prod false
ci true
dev true

Note: If you explicitly passed --select test_type:..., we would use the passed-in selector RATHER than the environment variable. We can think of the environment variable as being the "default" for the environment.

^^ is this actually possible?

Our thoughts:

  • you set this once, then forget about it
  • option 3 is the easiest to set "just set it to be false in prod"
  • option 2 is most similar to WARN_ERROR_OPTIONS
  • option 2 is a big gross (curly brackets! lists! oh my!)
  • if we go for option 3, would we need an equivalent environment variable for every test type INCLUDE_DATA_TEST (would you ever want to exclude your data tests??)

HYPOTHESIS -> We think it's likely that there will never be more than 5 test types, maybe there's only 2 test types forever. Thus, we think option 3 is best, and options 1 & 2 are probably over-engineered given we're not going to have tons of test types.

graciegoheen avatar Dec 14 '23 22:12 graciegoheen

Setting it in an environment variable is inconsistent from how we do this for other aspects. Historically we have set it in all 3.

martynydbt avatar Dec 18 '23 17:12 martynydbt

As discussed with @gshank and @dbeatty10.... For consistency, we should have this functionality able to be set in all 3 places: env var, project flags, and CLI flag. Related issue -> https://github.com/dbt-labs/dbt-core/issues/7036

graciegoheen avatar Jan 03 '24 00:01 graciegoheen

Similar to the include_saved_query flag https://github.com/dbt-labs/dbt-core/blob/c12f6fbf4db774356bd1472045ef4e1873969b08/core/dbt/cli/params.py#L409

Alternatively we could add a new environment variable for DBT_RESOURCE_TYPE (since we already have the --resource-type CLI option)

graciegoheen avatar Feb 13 '24 00:02 graciegoheen

Proposed new acceptance criteria to expand this solution to all resource types you may want to exclude from certain environments

  1. Create a new flag -> DBT_EXCLUDE_RESOURCE_TYPE / --exclude-resource-type / etc.

    • takes last cut after we've done existing selection (--select and --exclude)
    • --exclude-resource-type is mutually exclusive with --resource-type
    • default -> don't exclude ANY resource types
    • include/exclude influences what's selected in your commands, what shows up in your logs, etc; however, all resources should still be included in the manifest artifact
  2. Ideally, we'd also expand --resource-type to also be an "include resource types" environment variable, flag, etc (but that can be out of scope of this issue if needed)

  3. Rename include_saved_query to export_saved_query

    • keep old name for backwards compatibility
    • always include saved queries, but they don't do anything unless you set export_saved_query to true - this is not about excluding/including saved queries, but about whether or not the exports are run
    • when export_saved_query is false, you will see the saved queries in your log but explicitly called out as a no-op (this feels less hidden)

This will allow folks to configure:

  • exclude my unit tests in prod DBT_EXCLUDE_RESOURCE_TYPE = ['unit_test']
  • don't export your saved queries in any env except prod DBT_EXPORT_SAVED_QUERY = false
  • exclude snapshots in dev DBT_EXCLUDE_RESOURCE_TYPE= ['snapshot']
  • etc.

cc: @jtcohen6 @dbeatty10 @Jstein77

graciegoheen avatar Feb 22 '24 16:02 graciegoheen

Would be great to add docs here on best practices for what to set this environment variable to (for ci, prod, etc.)

graciegoheen avatar Feb 26 '24 20:02 graciegoheen

After various conversations, we decided that it didn't make a lot of sense to have this be settable in project flags, since it's really command specific, not project specific. So we're going to implement env vars and cli params. This applies to three commands: build, list, and clone.

gshank avatar Mar 12 '24 20:03 gshank

Does this not apply for run or test?

graciegoheen avatar Mar 12 '24 21:03 graciegoheen

No. The only commands that use this are those three.

gshank avatar Mar 13 '24 15:03 gshank

I guess you could change that... but that would be a different ticket, I think.

gshank avatar Mar 13 '24 15:03 gshank

Hello! How do you actually achieve this:

exclude snapshots in dev DBT_EXCLUDE_RESOURCE_TYPE= ['snapshot']

we also want to exclude them but couldn't figure out how other than maybe adding enabled: "{{ var('is_prod', false) | as_bool }}" or something in the config of snapshots.

polmonso avatar Jul 11 '24 09:07 polmonso

You can use the --exclude-resource-type flag (docs here) or the associated environment variable DBT_EXCLUDE_RESOURCE_TYPES (see here).

graciegoheen avatar Jul 11 '24 15:07 graciegoheen

How do you use it so it so it applies it only when target == prod? It's the exclusion depending on the target that I couldn't figure out.

polmonso avatar Jul 11 '24 16:07 polmonso

You can set the environment variable equal to something different in each of your environments. Here's some docs on how this works in dbt Cloud.

graciegoheen avatar Jul 12 '24 18:07 graciegoheen

In our case the environment doesn't change, only the target. So it'll go in the line of setting the exclude resource type variable depending on the target using jinja in the dbt_project.yml. Thanks for the pointers!

polmonso avatar Jul 15 '24 09:07 polmonso