datacompy
datacompy copied to clipboard
Snowflake per-column tolerances
When performing comparisons on entire DataFrames (Snowflake tables), the tolerances required may be different per-column. abs_tol and rel_tol take Dict[str, float] to map between the column names and their tolerances, with the special value __default to refer to columns that are not explicitly specified.
The report is updated to:
Default Absolute Tolerance: 0
Default Relative Tolerance: 0
...
Columns with Unequal Values or Types
------------------------------------
Column DF1 dtype DF2 dtype # Unequal Max Diff # Null Diff Abs Tol Rel Tol
1 DATE_FLD string string 2 0.00 2 0.00 0.00
2 FLOAT_FLD double double 2 0.05 1 0.01 0.02
0 NAME string string 1 0.00 0 0.00 0.00
Unit tests added:
- Showing the per-column tolerances, and a per-column tolerance with a default
- Showing tolerances in
all_mismatch().
All Snowflake tests pass.
Resolves #416
@mparaz Just confirming is this ready for review? might need some linting so the CICD can pass.
@mparaz Just confirming is this ready for review? might need some linting so the CICD can pass.
Oh sorry I thought you were asking the other team members about the report format. I’ll go ahead and do it as you described and add the test case for the effects of changing to per-column tolerance.
I’ll fix the linting. Thanks!
@mparaz Just confirming is this ready for review? might need some linting so the CICD can pass.
This is now ready for review, all changes discussed have been made. Thanks!
@rhaffar if you have some time on friday maybe to just test everything out offline and run it against a live cluster?
@rhaffar if you have some time on friday maybe to just test everything out offline and run it against a live cluster?
Running now
Getting an odd error running the following test - test_all_mismatch_ignore_matching_cols_no_cols_matching_rel_tol_dict I can take a look at it later, but @mparaz feel free to take a look yourself sooner if you have the time.
Getting an odd error running the following test -
test_all_mismatch_ignore_matching_cols_no_cols_matching_rel_tol_dict
Sorry, I committed the breakpoint() I added! Now fixed.
Just waiting on #424 to get merged in and then will rebase this PR for final review. etc. If it is OK I might push some changes to your branch to include the feature for the other frameworks.
Just waiting on #424 to get merged in and then will rebase this PR for final review. etc. If it is OK I might push some changes to your branch to include the feature for the other frameworks.
Sure I'm okay for you to push on this branch, though wouldn't it make the PR too large? I'm guessing you want per-column tolerances to all be in one PR?