pandas
pandas copied to clipboard
ENH: Expose integer formatting also for default display, not only styler
Feature Type
-
[X] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
I noticed that in Pandas version 2.2
the internal implementation of IntArrayFormatter
was changed (or at least renamed), marking the classes as private using one underscore. I used the implementation as described here to format integers in a DataFrame on each display.
See changes between 2.1
and 2.2
to make the linked patch work: link
I see that the styler supports formatting thousands:
import pandas as pd
pd.options.styler.format.thousands = ','
s = pd.Series([1_000_000]).to_frame()
s.style # does change display of s to 1,000,000
however
s # will display 1000000
Would it be possible to add the same to the display options?
Feature Description
pd.options.format.thousands = ','
would need to pass:
if pd.options.format.thousands:
format_str = f':{pd.options.format.thousands}d'.format
else:
format_str = ':d'.format
Alternative Solutions
Change the default here
class IntArrayFormatter(pf._GenericArrayFormatter):
def _format_strings(self):
formatter = self.formatter or '{:,d}'.format
fmt_values = [formatter(x) for x in self.values]
return fmt_values
pf._IntArrayFormatter = IntArrayFormatter
Using the above patch I get the display I would prefer:
>>> pd.Series([1_000_000])
0 1,000,000
dtype: int64
Additional Context
Working with large integers in tables is made easier if thousands are visually distinguished. Maybe there is already a way to specify that formatter function (str.format
) for display, which would be perfect.
Thanks for the request. From your example:
s.style # does not change display of s
I'm seeing that this does display the comma separator, are you not? Or perhaps you meant that s
without .style
still does not show the commas?
i dont think this feature is about Styler. It is about adding functionality to the default DataFrame display that is demonstrated by Styler, i.e. which Styler already possesses.
Yes exactly, currently I have to use the internals to make s
already represent with the commas.
>>> import pandas as pd
>>> pd.options.styler.format.thousands = ',' # here it would be great to have the option pd.options.display.thousands
>>> s = pd.Series([1_000_000])
>>> s
0 1000000
dtype: int64
>>>s.to_frame().style
0
0 1,000,000
I see that I put a "not" at the wrong place intially... Sorry about that. I'll clarify my example above by modifying it.
I would personally find this feature convenient, I'm positive on it. @attack68 do you have any thoughts?
If we do implement this, I think it should only be implemented for integer/float dtypes, and documented as such. Users should not expect pandas to infer whether to insert commas in object dtype values.
Hello @enryH,
My project team is looking for Pandas enhancement features for our grad school semester long project. We saw this task and would like to contribute if possible!
i think its generally quite handy to have configurable options for precision, thousands and decimal separator. Styler avoided using the locale package, if I recall correctly, but it was a bit tricky in the end to make sure the european decimal-comma was compatible with the us comma-decimal style.
To make it easier this should definitely only be applied to float/int dtypes.
i think its generally quite handy to have configurable options for precision, thousands and decimal separator. Styler avoided using the locale package, if I recall correctly, but it was a bit tricky in the end to make sure the european decimal-comma was compatible with the us comma-decimal style.
To make it easier this should definitely only be applied to float/int dtypes.
take
I guess it will boil down to figuring out how the formatting function is set (see IntArrayFormatter
here):
formatter = self.formatter or '{:d}'.format # how to set self.formatter from pandas.options
I ran into the same problem too (down to relying on the same StackOverflow code to modify display of integers). I also would like a cleaner way to do it.
Additionally, what is the recommended way to to handle the name change? I have some tools that need to support both pd.io.formats.format.IntArrayFormatter
and pd.io.formats.format._IntArrayFormatter
. Thank you.
I ran into the same problem too (down to relying on the same StackOverflow code to modify display of integers). I also would like a cleaner way to do it.
PRs welcome!
Additionally, what is the recommended way to to handle the name change? I have some tools that need to support both
pd.io.formats.format.IntArrayFormatter
andpd.io.formats.format._IntArrayFormatter
. Thank you.
IntArrayFormatter is not intended to be public. If you have a need to use it, can you open a new issue detailing it.
Hey @rhshadrach is the goal to create a new class or to improve the _IntArrayFormatter class and change it back to public?
Just wanted to understand what was considered in scope before tackling this issue.
If this is not the right forum/area to ask questions, please let know!
IntArrayFormatter is not intended to be public. If you have a need to use it, can you open a new issue detailing it.
Maybe it's easier to talk about what I am trying to do in order to avoid an XY problem. What I need is a way to automatically always show commas in integers. As IntArrayFormatter
is not intended to be public, I'd be happy to use any different method to accomplish the same goal.
Thanks @saiwing-yeung - we're open for PRs to add an option (e.g. pd.set_option("display.numeric_separator")
- could probably use a better name!). It would default to ''
, but users could change to e.g. '_'
or ','
.
@Meadiocre
is the goal to create a new class or to improve the _IntArrayFormatter class and change it back to public?
No - just expose a new option for users.