data-diff
data-diff copied to clipboard
Add option to define level of precision for a numeric column
When migrating a table or making a minor update to a dbt model, we like to confirm the the dev and prod tables are identical. Oftentimes, especially when a table contains floats, two numbers that appear to be exact matches to the eye are evaluated by the DB (Snowflake) as not matching. This can be frustrating, as it requires additional work to prove they do indeed match.
Describe the solution you'd like
I would like the ability to specify a level of precision for a given column. Using the example above, I'd like to specify that data-diff should consider two numbers that are within 0.001
(or any other value the user would like) of each other to be a match.
Describe alternatives you've considered My current workaround outside of data-diff looks like this:
-- all records in table_a not in table_b
WITH diffs AS (
SELECT
pk, numeric_col
FROM table_a
EXCEPT
SELECT
pk, numeric_col
FROM table_b
)
-- ideally returns 0 rows, showing that all records in table_a are in table_b
SELECT
b.pk, b.numeric_col, d.numeric_col
FROM table_b AS b
JOIN diffs AS d ON d.pk = b.pk
WHERE
ABS(b.numeric_col - d.numeric_col) > 0.001
Additional context I would like to be able to specify this either in the dbt models.yml meta values, or when running a joindiff/hashdiff.
Agree, this option is very needed. Also it can solve https://github.com/datafold/data-diff/issues/379
I can see it's already available in data-diff Cloud: https://docs.datafold.com/data_diff/in-database_diffing#tolerance-for-floats
Hi @daronjp and @ivan-toriya-precis!
I'm sorry for the delay in following up on this. Thank you for taking the time to raise this issue!
We made a hard decision to sunset the data-diff package and won't provide further development or support.
As @ivan-toriya-precis pointed out, custom numeric precision is supported in Datafold Cloud.
-Gleb