dbt-project-evaluator icon indicating copy to clipboard operation
dbt-project-evaluator copied to clipboard

Table name `fct_multiple_sources_joined` could be misleading as it includes unioned sources as well

Open b-per opened this issue 2 years ago • 5 comments

The table fct_multiple_sources_joined contains the models joining from multiple sources, as expected, but it also contains models that are unioning from multiple sources (which in some cases might make sense).

We could potentially rename the model from fct_multiple_sources_joined to fct_multiple_sources_joined_or_unioned. But at the same time, it would be good to have a way to flag the unions vs joins.

In our Internal Analytics project the current convention is to prepend the stg/base model name with unioned

b-per avatar Feb 14 '22 11:02 b-per

I agree the name of this model is confusing. I could be wrong... but I think union-ing directly from the source would still be bad, and this is where we would recommend using base models (Ex: The base models have a 1:1 relationship with the source and then the staging model unions together the relevant base models). If this is true, we could just change the table name to fct_multiple_sources_joined_or_unioned or even fct_multiple_direct_references_to_sources without worrying about flagging. What use case were you thinking of for flagging the unions vs. joins?

graciegoheen avatar Feb 14 '22 15:02 graciegoheen

The "flag" was to potentially remove the models that end with unioned but in that case people need to stick to that convention and we need to agree if we want to remove them or not.

The 1 to 1 between source and base/stg is mostly valid but I think that in some cases, when we loop through similar sources with Jinja to union them, having 1 to 1 relationship means that every time we have a new source we need to create a base model and not just modify our staging macro.

b-per avatar Feb 14 '22 18:02 b-per

Agree with @b-per here -- this is definitely a pattern I have seen (and used!) for when there are identical data sources across schemas -- union from the sources directly, then stage the data once instead of multiple times. Claire wrote a discourse article about this a while back.

Not sure what I think the solution here is, but I do think we will run across this pattern for sure

dave-connors-3 avatar Feb 22 '22 21:02 dave-connors-3

Do we still want to rename this model?

graciegoheen avatar Sep 21 '22 14:09 graciegoheen

I feel like it, yes. A few ideas for new names:

  • fct_multiple_sources_used_in_model
  • fct_multiple_sources_in_single_model
  • fct_multiple_sources_in_model

The initial discussion about union vs join and the "flag" is already available with the ability to define exceptions now. So people could easily exclude all the %_unioned from being reported.

b-per avatar Sep 22 '22 07:09 b-per