pandas icon indicating copy to clipboard operation
pandas copied to clipboard

ENH: Expose integer formatting also for default display, not only styler

Open enryH opened this issue 1 year ago • 14 comments

Feature Type

  • [X] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I noticed that in Pandas version 2.2 the internal implementation of IntArrayFormatter was changed (or at least renamed), marking the classes as private using one underscore. I used the implementation as described here to format integers in a DataFrame on each display.

See changes between 2.1 and 2.2 to make the linked patch work: link

I see that the styler supports formatting thousands:

import pandas as pd
pd.options.styler.format.thousands = ','
s = pd.Series([1_000_000]).to_frame()
s.style # does change display of s to 1,000,000

however

s # will display 1000000

Would it be possible to add the same to the display options?

Feature Description

pd.options.format.thousands = ','

would need to pass:

if pd.options.format.thousands:
  format_str = f':{pd.options.format.thousands}d'.format
else:
  format_str = ':d'.format

Alternative Solutions

Change the default here

class IntArrayFormatter(pf._GenericArrayFormatter):

    def _format_strings(self):
        formatter = self.formatter or '{:,d}'.format
        fmt_values = [formatter(x) for x in self.values]
        return fmt_values

pf._IntArrayFormatter = IntArrayFormatter

Using the above patch I get the display I would prefer:

>>> pd.Series([1_000_000])
0    1,000,000
dtype: int64

Additional Context

Working with large integers in tables is made easier if thousands are visually distinguished. Maybe there is already a way to specify that formatter function (str.format) for display, which would be perfect.

enryH avatar Jan 31 '24 16:01 enryH

Thanks for the request. From your example:

s.style # does not change display of s

I'm seeing that this does display the comma separator, are you not? Or perhaps you meant that s without .style still does not show the commas?

rhshadrach avatar Feb 01 '24 22:02 rhshadrach

i dont think this feature is about Styler. It is about adding functionality to the default DataFrame display that is demonstrated by Styler, i.e. which Styler already possesses.

attack68 avatar Feb 02 '24 06:02 attack68

Yes exactly, currently I have to use the internals to make s already represent with the commas.

>>> import pandas as pd
>>> pd.options.styler.format.thousands = ','  # here it would be great to have the option pd.options.display.thousands 
>>> s = pd.Series([1_000_000])

>>> s
0    1000000
dtype: int64
>>>s.to_frame().style
         0
0	1,000,000

I see that I put a "not" at the wrong place intially... Sorry about that. I'll clarify my example above by modifying it.

enryH avatar Feb 02 '24 10:02 enryH

I would personally find this feature convenient, I'm positive on it. @attack68 do you have any thoughts?

If we do implement this, I think it should only be implemented for integer/float dtypes, and documented as such. Users should not expect pandas to infer whether to insert commas in object dtype values.

rhshadrach avatar Feb 02 '24 22:02 rhshadrach

Hello @enryH,

My project team is looking for Pandas enhancement features for our grad school semester long project. We saw this task and would like to contribute if possible!

Meadiocre avatar Feb 03 '24 02:02 Meadiocre

i think its generally quite handy to have configurable options for precision, thousands and decimal separator. Styler avoided using the locale package, if I recall correctly, but it was a bit tricky in the end to make sure the european decimal-comma was compatible with the us comma-decimal style.

To make it easier this should definitely only be applied to float/int dtypes.

attack68 avatar Feb 03 '24 07:02 attack68

i think its generally quite handy to have configurable options for precision, thousands and decimal separator. Styler avoided using the locale package, if I recall correctly, but it was a bit tricky in the end to make sure the european decimal-comma was compatible with the us comma-decimal style.

To make it easier this should definitely only be applied to float/int dtypes.

attack68 avatar Feb 03 '24 07:02 attack68

take

Meadiocre avatar Feb 04 '24 01:02 Meadiocre

I guess it will boil down to figuring out how the formatting function is set (see IntArrayFormatter here):

formatter = self.formatter or '{:d}'.format  # how to set self.formatter from pandas.options

enryH avatar Feb 04 '24 14:02 enryH

I ran into the same problem too (down to relying on the same StackOverflow code to modify display of integers). I also would like a cleaner way to do it.

Additionally, what is the recommended way to to handle the name change? I have some tools that need to support both pd.io.formats.format.IntArrayFormatter and pd.io.formats.format._IntArrayFormatter. Thank you.

saiwing-yeung avatar Feb 12 '24 15:02 saiwing-yeung

I ran into the same problem too (down to relying on the same StackOverflow code to modify display of integers). I also would like a cleaner way to do it.

PRs welcome!

Additionally, what is the recommended way to to handle the name change? I have some tools that need to support both pd.io.formats.format.IntArrayFormatter and pd.io.formats.format._IntArrayFormatter. Thank you.

IntArrayFormatter is not intended to be public. If you have a need to use it, can you open a new issue detailing it.

rhshadrach avatar Feb 12 '24 21:02 rhshadrach

Hey @rhshadrach is the goal to create a new class or to improve the _IntArrayFormatter class and change it back to public?

Just wanted to understand what was considered in scope before tackling this issue.

If this is not the right forum/area to ask questions, please let know!

Meadiocre avatar Feb 15 '24 04:02 Meadiocre

IntArrayFormatter is not intended to be public. If you have a need to use it, can you open a new issue detailing it.

Maybe it's easier to talk about what I am trying to do in order to avoid an XY problem. What I need is a way to automatically always show commas in integers. As IntArrayFormatter is not intended to be public, I'd be happy to use any different method to accomplish the same goal.

saiwing-yeung avatar Feb 15 '24 05:02 saiwing-yeung

Thanks @saiwing-yeung - we're open for PRs to add an option (e.g. pd.set_option("display.numeric_separator") - could probably use a better name!). It would default to '', but users could change to e.g. '_' or ','.

@Meadiocre

is the goal to create a new class or to improve the _IntArrayFormatter class and change it back to public?

No - just expose a new option for users.

rhshadrach avatar Feb 15 '24 05:02 rhshadrach