datacompy icon indicating copy to clipboard operation
datacompy copied to clipboard

Snowflake per-column tolerances

Open mparaz opened this issue 5 months ago • 5 comments

When performing comparisons on entire DataFrames (Snowflake tables), the tolerances required may be different per-column. abs_tol and rel_tol take Dict[str, float] to map between the column names and their tolerances, with the special value __default to refer to columns that are not explicitly specified.

The report is updated to:

Default Absolute Tolerance: 0
Default Relative Tolerance: 0
...
Columns with Unequal Values or Types
------------------------------------

      Column DF1 dtype DF2 dtype  # Unequal  Max Diff  # Null Diff  Abs Tol  Rel Tol
1   DATE_FLD    string    string          2      0.00            2     0.00     0.00
2  FLOAT_FLD    double    double          2      0.05            1     0.01     0.02
0       NAME    string    string          1      0.00            0     0.00     0.00

Unit tests added:

  • Showing the per-column tolerances, and a per-column tolerance with a default
  • Showing tolerances in all_mismatch().

All Snowflake tests pass.

Resolves #416

mparaz avatar Jun 09 '25 13:06 mparaz

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 09 '25 13:06 CLAassistant

@mparaz Just confirming is this ready for review? might need some linting so the CICD can pass.

fdosani avatar Jun 23 '25 19:06 fdosani

@mparaz Just confirming is this ready for review? might need some linting so the CICD can pass.

Oh sorry I thought you were asking the other team members about the report format. I’ll go ahead and do it as you described and add the test case for the effects of changing to per-column tolerance.

I’ll fix the linting. Thanks!

mparaz avatar Jun 24 '25 00:06 mparaz

@mparaz Just confirming is this ready for review? might need some linting so the CICD can pass.

This is now ready for review, all changes discussed have been made. Thanks!

mparaz avatar Jun 24 '25 13:06 mparaz

@rhaffar if you have some time on friday maybe to just test everything out offline and run it against a live cluster?

fdosani avatar Jun 24 '25 21:06 fdosani

@rhaffar if you have some time on friday maybe to just test everything out offline and run it against a live cluster?

Running now

rhaffar avatar Jun 30 '25 15:06 rhaffar

Getting an odd error running the following test - test_all_mismatch_ignore_matching_cols_no_cols_matching_rel_tol_dict I can take a look at it later, but @mparaz feel free to take a look yourself sooner if you have the time.

rhaffar avatar Jun 30 '25 18:06 rhaffar

Getting an odd error running the following test - test_all_mismatch_ignore_matching_cols_no_cols_matching_rel_tol_dict

Sorry, I committed the breakpoint() I added! Now fixed.

mparaz avatar Jul 01 '25 12:07 mparaz

Just waiting on #424 to get merged in and then will rebase this PR for final review. etc. If it is OK I might push some changes to your branch to include the feature for the other frameworks.

fdosani avatar Jul 02 '25 15:07 fdosani

Just waiting on #424 to get merged in and then will rebase this PR for final review. etc. If it is OK I might push some changes to your branch to include the feature for the other frameworks.

Sure I'm okay for you to push on this branch, though wouldn't it make the PR too large? I'm guessing you want per-column tolerances to all be in one PR?

mparaz avatar Jul 05 '25 10:07 mparaz