deequ icon indicating copy to clipboard operation
deequ copied to clipboard

fix ratio in constraint_message

Open Aigul9 opened this issue 1 year ago • 0 comments

Hello!

I've just set up the library and noticed this thing:

Here is the data example: image

The tests: image

And the sample of the results: image

As you can see, the first constraint_message says that 60% of data didn't meet the requirement, although 60% of it did meet. In the second row, it says that 0% didn't meet which means that 100% is passed successfully, thought it's the opposite: none of the values among ga_visits column is unique.

Description of changes: I propose to change the formula of calculating ratio in constraint_message, so it becomes the ratio of mismatched values. If we use val ratio = mismatchCount.toDouble / primaryCount, then the results for my case would be 4/10=0.4 and 10/10=1 "didn't meet the constraint requirement".

Another approach is to omit not in the message, however, I'm not sure if it follows the logic.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Aigul9 avatar Dec 06 '23 13:12 Aigul9