TableReport float precision
Related to #1410
Edit: I see the tests are failing, but im gonna take a look at it when i wake up. Also any feedback is welcome, especially to lmk if this is the right direction, thanks.
Edit2: I guess we can also get rid of the rounding applied directly to the sub_df and instead round each value v (if its a float) when we iterate on every row of the sub_df. Im also aware of the linting issues.
Edit2: I guess we can also get rid of the rounding applied directly to the sub_df and instead round each value
v(if its a float) when we iterate on every row of the sub_df. Im also aware of the linting issues.
This would avoid the inconsistency between Polars and Pandas and it would keep the formatting where it's supposed to be used (the templating code), so it's a better solution overall. @jeromedockes suggested the same thing in the issue
There should still be a check for float_precision that raises an exception when the number is < 0, and the relative test
Edit2: I guess we can also get rid of the rounding applied directly to the sub_df and instead round each value
v(if its a float) when we iterate on every row of the sub_df. Im also aware of the linting issues.This would avoid the inconsistency between Polars and Pandas and it would keep the formatting where it's supposed to be used (the templating code), so it's a better solution overall. @jeromedockes suggested the same thing in the issue
There should still be a check for
float_precisionthat raises an exception when the number is < 0, and the relative test
Just to clarify, should we move the rounding to every value v when iterating through the sub_df rows, or do we want to skip any rounding upfront and just pass the float_precision to the html template to handle the formatting there?
For the latter, we'd still need to use rounding to handle negative float_precision values in order to replicate pandas behaviour i guess.
Hey @georgescutelnicu, I overlooked a different solution that would simplify this PR a lot, and would make it more convenient to set up.
A recent PR implemented a global config for skrub, which could be used here to set the precision everywhere without having to deal with the internals of the TableReport.
You might have to merge upstream/main on your branch, but once you do you'll have the skrub/_config.py file to modify. All there is to do is add a "floating_precision" parameter to the config, which means updating the _global_config dictionary, set_config and config_context.
Then, you can go in skrub/_reporting/utils.py and update what's in format_number to something like this:
if isinstance(number, numbers.Real):
var = get_config()["float_precision"]
return f"{number:#.{var}g}"
And finally update the template skrub/_reporting/_data/template/dataframe-sample.html by setting
{{ cell["value"]}}
to
{{ cell["value"] | format_number}}
By doing this, the user would be able to set the desired number of digits through set_config and the config_context. It also sidesteps the issue of having different behaviors with pandas and polars.
Unfortunately, it also means that most of what you've done so far will have to be rewritten, I'm really sorry about that 🙈
Closing this in favor of #1470