data-diff icon indicating copy to clipboard operation
data-diff copied to clipboard

Add option to define level of precision for a numeric column

Open daronjp opened this issue 1 year ago • 1 comments

When migrating a table or making a minor update to a dbt model, we like to confirm the the dev and prod tables are identical. Oftentimes, especially when a table contains floats, two numbers that appear to be exact matches to the eye are evaluated by the DB (Snowflake) as not matching. This can be frustrating, as it requires additional work to prove they do indeed match.

Describe the solution you'd like I would like the ability to specify a level of precision for a given column. Using the example above, I'd like to specify that data-diff should consider two numbers that are within 0.001 (or any other value the user would like) of each other to be a match.

Describe alternatives you've considered My current workaround outside of data-diff looks like this:

-- all records in table_a not in table_b
WITH diffs AS (
    SELECT
        pk, numeric_col
    FROM table_a

    EXCEPT

    SELECT
        pk, numeric_col
    FROM table_b
)

-- ideally returns 0 rows, showing that all records in table_a are in table_b
SELECT
    b.pk, b.numeric_col, d.numeric_col
FROM table_b AS b
JOIN diffs AS d ON d.pk = b.pk
WHERE
    ABS(b.numeric_col - d.numeric_col) > 0.001

Additional context I would like to be able to specify this either in the dbt models.yml meta values, or when running a joindiff/hashdiff.

daronjp avatar Jul 19 '23 22:07 daronjp

Agree, this option is very needed. Also it can solve https://github.com/datafold/data-diff/issues/379

I can see it's already available in data-diff Cloud: https://docs.datafold.com/data_diff/in-database_diffing#tolerance-for-floats

ivan-toriya-precis avatar Jan 04 '24 13:01 ivan-toriya-precis

Hi @daronjp and @ivan-toriya-precis!

I'm sorry for the delay in following up on this. Thank you for taking the time to raise this issue!

We made a hard decision to sunset the data-diff package and won't provide further development or support.

As @ivan-toriya-precis pointed out, custom numeric precision is supported in Datafold Cloud.

-Gleb

glebmezh avatar May 17 '24 13:05 glebmezh