pint icon indicating copy to clipboard operation
pint copied to clipboard

changes to the formatting mini-language

Open keewis opened this issue 1 year ago • 1 comments

There have been a few issues about conflicts with the magnitude formats (mostly the # quantity modifier, see #1413), and the code to separate magnitude spec from unit spec is a bit complex. As such, I'd like to raise my suggestion from https://github.com/hgrecco/pint/issues/1413#issuecomment-1140537111 in a new issue: what do you think about deprecating the old format and transitioning to something less ambiguous?

The proposal from https://github.com/hgrecco/pint/issues/1413#issuecomment-1140537111:

how about we change the mini-language to:

mspec[pint_spec]

where mspec is the magnitude format and pint_spec the unit / quantity format.

For example, with a mspec of .03f# and pint_spec of ~D this would become

.03f#[~D]

That way, the intent would be very clear, and parsing the format would also be very easy.

Just to be sure, though, I'm not insisting on this particular format: any format that has the same effect would be fine with me.

keewis avatar Sep 19 '22 12:09 keewis

I actually like the idea of moving to a less ambiguous spec. The current one has served it purpose, but is becoming difficult to reason around and parse.

I would like to mention one more thing, just to have everything in the same discussion. People have sometimes asked me if I was planning to introduce a way to format unit exponents. They have suggested using the ^ to indicate that what comes next is about the exponent and the usual python formatting codes should be available (e.g. ^d to use a d formatting for the exponents). Plus adding a custom format string to format it as a fraction. (i.e. doing for each exponent "%d/%d" % Fraction.from_float(value).as_integer_ratio())

Now going back to your proposal, enclosing the pint formatters is good idea. Easy to explain, easy to parse, easy to implement. I think that [] is fine because it cannot be confused for a getitem operation inside a string formatting for the position it will be located. But I just want to be sure.

hgrecco avatar Sep 23 '22 03:09 hgrecco

I tried implementing this and noticed that while the proposed format is very easy to use if both magnitude and unit specs are passed, omitting either seems a bit trickier.

In short, I forgot to define the syntax of the format spec in that case. Not sure if those are the best options, but I think we could choose to enclose the uspec in brackets, or we could detect the presence of a custom formatter and depending on that declare the entire spec string as either mspec or uspec (or maybe support both?). The advantage of the former would be to avoid conflicts between formatter names and magnitude format specs (but given that we can actually use multi-character formatter names that might be sufficiently unlikely?), while the latter would stay more consistent with what we currently have.

Since I wouldn't have to change the formats in pint-xarray (which only formats the units, never the magnitude directly) I think I'd actually prefer the latter (but again, we could support both).

Here's how this would look:

".02#f[~%#P]"  # both
".05d"  # just the mspec
"[~P]"  # uspec, option 1
"~P"  # uspec, option 2

keewis avatar Sep 26 '22 16:09 keewis

By the way, sometime ago I moved formatting to a facet. This simplified a lot of the code and make it easier to read (IMHO). Today, in #1595 I finished changing how the parser is implemented. Instead of being intertwined with the Registry (or in facet as was my original plan), it is cleanly composed with it with a very thin interaction surface.

Taking with @maurosilber, he suggested that formatting might be better done in the same way. i.e. having in the registry a Formatting object that can be configured or even swapped.

It should not be very hard given the current state of affairs and might simplify a few things such as the way is configured, hacked, evolved, replaced, tested.

hgrecco avatar Sep 30 '22 03:09 hgrecco