dbt-utils icon indicating copy to clipboard operation
dbt-utils copied to clipboard

Equality test calculation doesn't make sense

Open foundinblank opened this issue 1 year ago • 2 comments

Describe the bug

This is to discuss the logic behind the existing calculation in the equality test and propose a more straightforward calculation.

Currently the calculation is:

https://github.com/dbt-labs/dbt-utils/blob/6ba7b660b1d1b4e3b41cb0cf6c3c0e4a70ae54e4/macros/generic_tests/equality.sql#L8-L11

This has the effect of returning unusual numbers in the console when a_minus_b vs b_minus_a are different numbers.

Example:

  1. Failure table contains 1,000 rows total
  2. Failure table contains 250 rows of a_minus_b
  3. Failure table contains 750 rows of b_minus_a
  4. My expectation is the output in the console would report [FAIL 1000] because the failure table has 1,000 rows
  5. However, the actual output is [FAIL 1500] because
    1. count(*) = 1,000
    2. abs(250-750) = -500
    3. total = 1,500

As a dbt user, if I see 1,500 failures, I expect to find 1,500 failing rows, and it's very confusing when I investigate and find 1,000 failing rows instead.

This calculation seems to have been part of the equality test since it was first committed in 2017 by @jthandy. Would anyone know why the calculation is set up in this way, and if we would be fine with just a count(*) instead? Happy to make the PR for it after some discussion! If there is a good reason for keeping the calculation as-is, I'm happy to write additional comments in the macro file for other users who may be confused too 😄

Are you interested in contributing the fix?

Yes!

foundinblank avatar Sep 14 '23 08:09 foundinblank

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Mar 13 '24 01:03 github-actions[bot]

Let's keep this open

foundinblank avatar Mar 13 '24 13:03 foundinblank

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] avatar Sep 10 '24 01:09 github-actions[bot]

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

github-actions[bot] avatar Sep 18 '24 01:09 github-actions[bot]