dbt-utils icon indicating copy to clipboard operation
dbt-utils copied to clipboard

Create/update a macro to verify sum of row counts of n relations

Open aranke opened this issue 1 year ago • 3 comments

Describe the feature

There are situations where a model simply is a union all of various other models:

-- int_my_source
select * from {{ ref('stg_my_source_a') }}
union all
select * from {{ ref('stg_my_source_b') }}
union all
select * from {{ ref('stg_my_source_c') }}

In this example, it would be useful to have a 1:many test to verify that:

count(int_my_source) = count(stg_my_source_a) + count(stg_my_source_b) + count(stg_my_source_c)

Describe alternatives you've considered

equal_rowcount is the closest thing in dbt-utils, but only does 1:1 relations

Additional context

I don't think this feature is database-specific.

Who will this benefit?

This feature has been requested several times over the years (dbt internal Slack, previous GitHub issue); this feature will benefit those users.

Are you interested in contributing to this feature?

Sure, happy to help.

aranke avatar Jul 25 '24 19:07 aranke

@aranke Thanks for raising this issue and linking those two other places where this has come up!

If the model is really that simple, what would be the value of a data test here? Wouldn't the counts be guaranteed to be equal? I don't see how the count of rows could be anything but exactly the same in this case.

dbeatty10 avatar Jul 26 '24 18:07 dbeatty10

Hi adding my 2 cents here and I raised this with dbt support too. It's useful in our case when the unioned model is incremental. i.e. we're selecting from stg_my_source_a where timestamp > now - 2 days for example. And want to see if the unioned model still has all the rows in the individual staging models. Hope that makes sense.

edmei1 avatar Jul 27 '24 05:07 edmei1

Thanks for that info @edmei1 !

Could you share an example model that shows more detail? Something like this:

models/int_my_source.sql

{{
    config(
        materialized='incremental'
    )
}}

with unioned_models as (

    select * from {{ ref('stg_my_source_a') }}
    union all
    select * from {{ ref('stg_my_source_b') }}
    union all
    select * from {{ ref('stg_my_source_c') }}

)

select *
from unioned_models

{% if is_incremental() %}

where event_time >= (select coalesce(max(event_time),'1900-01-01') from {{ this }} )

{% endif %}

dbeatty10 avatar Jul 30 '24 16:07 dbeatty10

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Oct 29 '24 01:10 github-actions[bot]

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

github-actions[bot] avatar Nov 06 '24 01:11 github-actions[bot]

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

github-actions[bot] avatar Nov 06 '24 01:11 github-actions[bot]