soda-core icon indicating copy to clipboard operation
soda-core copied to clipboard

invalid_percent not working as expected with soda-core-spark-df = 3.5.1

Open Jaisinghani opened this issue 6 months ago • 2 comments

Observed that the rule seems to be not working with invalid_percent, the same works with invalid_count, below are the details soda-core-spark-df = 3.5.1 Rule tested = NON_NEGATIVE Check = checks for DATASET_20250701': invalid_percent(SCORE): valid_min: 0 warn: when > 0 name: daily_score_check Scan Summary = daily_score_check [/workspace/checks.yml] [PASSED] check_value: 0.0 row_count: 10 invalid_count: 2

However the same rule when configured with invalid_count seems to work as expected

Check = checks for DATASET_20250701': invalid_count(SCORE): valid_min: 0 warn: when > 0 name: daily_score_check Scan Summary = daily_score_check [/workspace/checks.yml] [WARNED] check_value: 2

Jaisinghani avatar Jul 07 '25 19:07 Jaisinghani

CLOUD-9194

tools-soda avatar Jul 07 '25 19:07 tools-soda

I noticed that the check_value for invalid_percent is being incorrectly reported as 0.0 even when the actual percentage is a very small non-zero value. For example : row_count = 1864458 invalid_count = 3 The expected invalid_percent is approximately 0.00016, but the check_value returned post scan is 0.0. This leads to confusion and misreporting in quality checks. It appears that the value is being rounded or truncated during result formatting or serialization in soda-core-spark-df.

Jaisinghani avatar Jul 07 '25 23:07 Jaisinghani