pint icon indicating copy to clipboard operation
pint copied to clipboard

"#" modifier cannot be used for magnitude formatting

Open Mark42XLII opened this issue 3 years ago • 5 comments

Consider this:

import pint
ureg = pint.UnitRegistry()
x = 0.9999
q = x * ureg.meters
print(f"{x:#.2g}")
print(f"{q:#.2g}")

This outputs:

1.0
1e+03 millimeter

While I would expect:

1.0
1.0 meter

Being able to use something like format(value, f"#.{n}g" is extremely useful, as it basically ensures value will be printed out with n significant digits.

The issue here is that pint strips all occurrences of the # modifier here and here, as if it could only mean the intent to use .to_compact() on a quantity. Thus, it never makes its way into mspec, making it impossible to use for magnitude formatting.

I can see two solutions for this: either the modifier could be changed to something else (I've changed it to @ privately to keep working, but that would break backwards compatibility) or a more rigid ordering could be required - for example, specifying that the # modifier must not preceed the magnitude format specification and/or it must preceed the unit format specification.

Mark42XLII avatar Nov 15 '21 04:11 Mark42XLII

I think it might be easier to do the split into unit / magnitude formatters earlier:

mspec, uspec = split_quantity_format(spec)

then uspec can be modified without affecting mspec.

We just have to keep in mind that there are plans to add quantity formats (i.e. ones that know how to format both the magnitude and the units), so we shouldn't make that more difficult to support.

keewis avatar Nov 15 '21 10:11 keewis

It appears that with #1419, this has now changed: a # in a format specifier continues to call to_compact, but a # in default_format is now passed through for magnitude formatting, both regardless of location.

This unfortunately causes a bit of a nightmare for Decimal, as Python silently chooses between two decimal implementations, '#' is supported for _pydecimal but not for cdecimal, cdecimal does not give a useful error message in that failure, and so a default format string that previously called to_compact will now break on some systems, but not others.

cgevans avatar Mar 02 '22 23:03 cgevans

@cgevans can you provide an example ?

jules-ch avatar Mar 03 '22 06:03 jules-ch

Here, for example, '#' works explicitly but not by default:

import pint
ureg = pint.UnitRegistry()
q = ureg.Quantity(1000, "mm")
format(q, "#~P")
#[Out]# '1.0 m'
format(q, "")
#[Out]# '1000 millimeter'
ureg.default_format = "#~P"
format(q)
#[Out]# '1000 mm'
ureg.default_format = "~#P"
format(q)
#[Out]# '1000 mm'
ureg.default_format = "~P#"
format(q)
#[Out]# '1000 mm'
format(q, "#")
#[Out]# '1.0 m'
ureg.default_format = "#"
format(q)
#[Out]# '1000 millimeter'
format(q, "#")
#[Out]# '1.0 meter'

With decimal, you can see clearly that the default # is being passed through to the magnitude, while the explicit one calls to_compact. This additionally causes problems with python issue 46904.

import pint
from _pydecimal import Decimal as PyDecimal  # Some systems use this for Decimal
from _decimal import Decimal as CDecimal  # Others use this
ureg_py = pint.UnitRegistry(non_int_type=PyDecimal)
ureg_c = pint.UnitRegistry(non_int_type=CDecimal)
q_py = ureg_py("1000. mm")
q_c = ureg_c("1000. mm")
format(q_py)  # No decimal.
#[Out]# '1000 millimeter'
format(q_c)
#[Out]# '1000 millimeter'
format(q_py, "#")  # compact
#[Out]# '1.000 meter'
format(q_c, "#")
#[Out]# '1.000 meter'
ureg_py.default_format = "#"
ureg_c.default_format = "#"
format(q_py)  # decimal point -> # is being passed to PyDecimal
#[Out]# '1000. millimeter'
format(q_py, "#")  # decimal point and compact
#[Out]# '1.000 meter'
format(q_c)  # CDecimal doesn't support #
#[Out]# => ValueError: invalid format string

cgevans avatar Mar 03 '22 12:03 cgevans

upon further consideration, it might be a good idea to rethink our string formatting mini-language: at the moment, we allow to freely mix magnitude, unit, and quantity formatters, but inevitably there will be character conflicts.

As such, how about we change the mini-language to:

mspec[pint_spec]

where mspec is the magnitude format and pint_spec the unit / quantity format.

For example, with a mspec of .03f# and pint_spec of ~D this would become

.03f#[~D]

That way, the intent would be very clear, and parsing the format would also be very easy.

Just to be sure, though, I'm not insisting on this particular format: any format that has the same effect would be fine with me.

keewis avatar May 29 '22 22:05 keewis