DiCE icon indicating copy to clipboard operation
DiCE copied to clipboard

PublicData.get_decimal_precisions fails for very small or large numerical values

Open michael-t-alexander opened this issue 1 year ago • 4 comments

If the mode of a float type column is <= 1e-5 or >= 1e16, this line in get_decimal_precisions fails as the string representation of the mode uses scientific notation and doesn't contain a decimal point: https://github.com/interpretml/DiCE/blob/8027ebbf696e8b6c9344a889fb1ba4e90ea448d9/dice_ml/data_interfaces/public_data_interface.py#L396

michael-t-alexander avatar Aug 23 '24 09:08 michael-t-alexander

I have encountered this issue as well. Unlike with other issues, it is not generally possible to fix it without modifying the training data in a way that changes its meaning. I do not see a good way to work around it, some datasets simply cannot be used with DiCE until this bug is fixed.

fabiensatalia avatar Jan 23 '25 16:01 fabiensatalia

I too encountered this.

VinuraD avatar Mar 06 '25 21:03 VinuraD

Hi, would you mind adding a minimal reproducible example so that one can investigate this?

CloseChoice avatar Jun 02 '25 18:06 CloseChoice

I do not have time to provide a MWE that involves DiCE, but here is a MWE that makes the same mistake as DiCE:

import pandas as pd
x = pd.DataFrame({'col1': [1e-9]})
modes = x['col1'].mode()
str(modes[0]).split('.')[1]

Any dataset with a column whose mode's string representation does not contain a dot will cause the same issue. The assumption is made on line 396 (as quoted above) that any value returned by mode(), when represented as a string, contains a dot. This assumption is incorrect. The fix is to find a different way to calculate maxp.

fabiensatalia avatar Jun 03 '25 07:06 fabiensatalia