dbt-core icon indicating copy to clipboard operation
dbt-core copied to clipboard

[Feature] Include sources in `dbt list -s "fqn:*"`

Open dbeatty10 opened this issue 1 year ago • 5 comments

Is this your first time submitting a feature request?

  • [X] I have read the expectations for open source contributors
  • [X] I have searched the existing issues, and I could not find an existing issue for this feature
  • [X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

User story

As a developer on a dbt project, I sometimes want to define a selector in terms of "include everything except for ..." so that it is easy to write and includes precisely the desired nodes.

Known examples

  • dbt internal analytics project
  • https://github.com/dbt-labs/dbt-core/issues/9678#issuecomment-1966839715

One use case is defining a series of selectors that partition a dbt project. To make sure that everything is covered, the final selector would be defined as "everything that isn't one of the previously defined selectors".

Proposed solution

The easiest way to fulfill the user story above is to have a selection method that will select "all nodes". The most natural way to do that would be via "fqn:*" (as long as all node / resource types are included).

Describe the feature

When running dbt list -s "fqn:*", include all sources in the output.

For example, suppose I have project files like described in https://github.com/dbt-labs/docs.getdbt.com/issues/4492#issuecomment-1881658603.

If I have the following source definition within models/_sources.yml, then I'd expect to be able to use the fqn method to select it.

sources:
  - name: my_src
    database: "{{ target.database }}"
    schema: "{{ target.schema }}"
    tables:
      - name: my_seed

Describe alternatives you've considered

Currently, sources are not included by the fqn method like this:

dbt list -s "fqn:*"

Output:

01:09:56  Running with dbt=1.7.8
01:09:57  Registered adapter: postgres=1.7.8
01:09:57  Found 1 seed, 1 snapshot, 2 models, 1 analysis, 1 test, 1 source, 1 exposure, 1 metric, 401 macros, 1 group, 1 semantic model
exposure:my_project.my_exposure
metric:my_project.my_metric
my_project.metricflow_time_spine
my_project.my_model
my_project.my_seed
semantic_model:my_project.my_semantic_model
my_project.my_snapshot.my_snapshot
my_project.not_null_my_model_id

However, they are included in the output of this command:

dbt list --resource-types all

Output:

01:10:31  Running with dbt=1.7.8
01:10:32  Registered adapter: postgres=1.7.8
01:10:32  Found 1 seed, 1 snapshot, 2 models, 1 analysis, 1 test, 1 source, 1 exposure, 1 metric, 401 macros, 1 group, 1 semantic model
my_project.analysis.my_analysis
exposure:my_project.my_exposure
metric:my_project.my_metric
my_project.metricflow_time_spine
my_project.my_model
my_project.my_seed
semantic_model:my_project.my_semantic_model
my_project.my_snapshot.my_snapshot
source:my_project.my_src.my_seed
my_project.not_null_my_model_id

Who will this benefit?

Here's an example of creating a default to selector that is meant to include everything except certain models:

https://github.com/dbt-labs/dbt-core/issues/9678#issuecomment-1966839715

The user would like to use fqn:* to start with "everything" and then add specific exclusions from there.

Are you interested in contributing this feature?

No response

Anything else?

See also: https://github.com/dbt-labs/dbt-core/issues/9693

Related internal Slack thread: https://dbt-labs.slack.com/archives/C05FWBP9X1U/p1709217641798779

dbeatty10 avatar Feb 28 '24 01:02 dbeatty10

Potential fix:

It looks like sources are not included in the search strategy for the FQN selector: https://github.com/dbt-labs/dbt-core/blob/9d232398eed32caf07487b0df790bfd5f792e0c2/core/dbt/graph/selector_methods.py#L255-L263 I can change this to all_nodes which should include sources.

aranke avatar Feb 29 '24 19:02 aranke

From internal Slack:

Sources have never been included in fqn:*, because they are selected as source:* instead. Only models/seeds/snapshots/tests are included by fqn.

@jtcohen6 do you know why ^?

That’s why the “default” node selection is so verbose.

Starting in 1.7, docs generate respects the node selection. So if there is a default yaml selector defined, that will now apply to the docs generate step too.

More context here

graciegoheen avatar Mar 11 '24 18:03 graciegoheen

I think we should add sources (and analyses) to fqn:*. Reasons below.

Research

I've only been able to find two resource types that are not included by dbt list -s "fqn:*":

  1. sources
  2. analyses

Reprex

  1. Start with these project files
  2. Run dbt list -s "fqn:*"
    • 👉 Notice that exposures, semantic_models, and metrics are included (but sources and analyses are not)
  3. Then run dbt list --resource-types all
    • Notice that sources and analyses are included

Additional context

Quoting @jtcohen6 from https://github.com/dbt-labs/dbt-core/pull/8589#issuecomment-1711302455:

It does feel like there's a real opportunity for refactoring here. It feels odd that sources/exposures/semantic_models/metrics are "pointer" node types, as opposed to the "logical" node types (models/seeds/snapshots/tests/analyses), and only those are included by the fqn:* selection.

But I think that's all out of scope for something we want to backport to v1.6!

Including sources within fqn:*

Pros

It seems like we can add sources (and analyses) to fqn:* without users losing any flexibility:

Cons

Are there any negative consequences to including sources in fqn:*?

I don't know of any, but I could be overlooking something.

Follow-up refactoring opportunity

If we make it so that fqn:* includes all resource types, then we might also be able to simplify this: https://github.com/dbt-labs/dbt-core/blob/8a395e928d1016368712e80641642a57e59590b4/core/dbt/graph/cli.py#L24

to this:

DEFAULT_INCLUDES: List[str] = ["fqn:*"]

As it currently stands, "exposure:*", "metric:*", "semantic_model:*" might already be unnecessary.

dbeatty10 avatar Mar 11 '24 21:03 dbeatty10