skrub icon indicating copy to clipboard operation
skrub copied to clipboard

TableReport float precision

Open georgescutelnicu opened this issue 6 months ago • 2 comments

Related to #1410

Edit: I see the tests are failing, but im gonna take a look at it when i wake up. Also any feedback is welcome, especially to lmk if this is the right direction, thanks.

Edit2: I guess we can also get rid of the rounding applied directly to the sub_df and instead round each value v (if its a float) when we iterate on every row of the sub_df. Im also aware of the linting issues.

georgescutelnicu avatar Jun 20 '25 23:06 georgescutelnicu

Edit2: I guess we can also get rid of the rounding applied directly to the sub_df and instead round each value v (if its a float) when we iterate on every row of the sub_df. Im also aware of the linting issues.

This would avoid the inconsistency between Polars and Pandas and it would keep the formatting where it's supposed to be used (the templating code), so it's a better solution overall. @jeromedockes suggested the same thing in the issue

There should still be a check for float_precision that raises an exception when the number is < 0, and the relative test

rcap107 avatar Jun 23 '25 11:06 rcap107

Edit2: I guess we can also get rid of the rounding applied directly to the sub_df and instead round each value v (if its a float) when we iterate on every row of the sub_df. Im also aware of the linting issues.

This would avoid the inconsistency between Polars and Pandas and it would keep the formatting where it's supposed to be used (the templating code), so it's a better solution overall. @jeromedockes suggested the same thing in the issue

There should still be a check for float_precision that raises an exception when the number is < 0, and the relative test

Just to clarify, should we move the rounding to every value v when iterating through the sub_df rows, or do we want to skip any rounding upfront and just pass the float_precision to the html template to handle the formatting there?

For the latter, we'd still need to use rounding to handle negative float_precision values in order to replicate pandas behaviour i guess.

georgescutelnicu avatar Jun 23 '25 17:06 georgescutelnicu

Hey @georgescutelnicu, I overlooked a different solution that would simplify this PR a lot, and would make it more convenient to set up.

A recent PR implemented a global config for skrub, which could be used here to set the precision everywhere without having to deal with the internals of the TableReport.

You might have to merge upstream/main on your branch, but once you do you'll have the skrub/_config.py file to modify. All there is to do is add a "floating_precision" parameter to the config, which means updating the _global_config dictionary, set_config and config_context.

Then, you can go in skrub/_reporting/utils.py and update what's in format_number to something like this:

    if isinstance(number, numbers.Real):
        var = get_config()["float_precision"]
        return f"{number:#.{var}g}"

And finally update the template skrub/_reporting/_data/template/dataframe-sample.html by setting

{{ cell["value"]}}

to

{{ cell["value"] | format_number}}

By doing this, the user would be able to set the desired number of digits through set_config and the config_context. It also sidesteps the issue of having different behaviors with pandas and polars.

Unfortunately, it also means that most of what you've done so far will have to be rewritten, I'm really sorry about that 🙈

rcap107 avatar Jun 24 '25 18:06 rcap107

Closing this in favor of #1470

rcap107 avatar Jun 26 '25 10:06 rcap107