cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Add a description about formatting decimal.Decimal with the e/E symbol by __format__.

Open 1nftf opened this issue 1 month ago • 18 comments

Test output
# Compare different precision on Decimal.
format(Decimal('0'), ".0E")      '0E+0'
format(Decimal('0'), ".1E")      '0.0E+1'
format(Decimal('0'), ".2E")      '0.00E+2'
format(Decimal('0'), ".3E")      '0.000E+3'
format(Decimal('0'), ".30E")     '0.000000000000000000000000000000E+30'
format(Decimal('-0.0'), ".0E")   '-0E-1'
format(Decimal('-0.0'), ".1E")   '-0.0E+0'
format(Decimal('-0.0'), ".2E")   '-0.00E+1'
format(Decimal('-0.0'), ".3E")   '-0.000E+2'
format(Decimal('-0.0'), ".30E")  '-0.000000000000000000000000000000E+29'

# Compare different 0 values.
format(0, ".3E")                '0.000E+00'
format(0.0, ".3E")              '0.000E+00'
format(-0.0, ".3E")             '-0.000E+00'
format(0j, ".3E")               '0.000E+00+0.000E+00j'
format(Decimal('0'), ".3E")     '0.000E+3'
format(Decimal('-0.0'), ".3E")  '-0.000E+2'

# Compare different format methods.
format(Decimal('0'), ".30E")                          '0.000000000000000000000000000000E+30'
format(1234512345123451234512345, ".30E")             '1.234512345123451205320704000000E+24'
format(Decimal('1234512345123451234512345'), ".30E")  '1.234512345123451234512345000000E+24'
"%.30E" % (Decimal('0'),)                             '0.000000000000000000000000000000E+00'  # exponent is 0, because the value is converted to float first
"%.30E" % (1234512345123451234512345,)                '1.234512345123451205320704000000E+24'
"%.30E" % (Decimal('1234512345123451234512345'),)     '1.234512345123451205320704000000E+24'

NOTE: The internal exponent of the Decimal, rather than its sign, is causing the difference. It seams that precision will affect exponent.

  • When formatting Decimal('0'), the exponent is equal to precision.
  • When formatting Decimal('-0.0'), the exponent is equal to precision-1.

In addition to the above behavior, the exponent is not padded to two digits, which also makes it inconsistent with the built-in types.

Although the results are numerically correct, and the document (https://docs.python.org/3/library/string.html#format-specification-mini-language) does not limit the exponent when the coefficient is 0. However, this can be confusing for users.

Maybe the document needs to add a description.~~note, or change the result of Decimal type to be consistent with built-in types.~~

Test script
from decimal import Decimal

templates = (
    'format({v}, ".{p}E")',
    # 'f"{{{v}:.{p}E}}"',
    # '"{{:.{p}E}}".format({v})',
    # '"%.{p}E" % ({v},)',

    # 'format({v}, ".{p}e")'
    # 'f"{{{v}:.{p}e}}"',
    # '"{{:.{p}e}}".format({v})',
    # '"%.{p}e" % ({v},)',
)

values = (
    # 0,
    # 0.0,
    # -0.0,
    # complex(0),
    Decimal('0'),
    Decimal('-0.0'),

    # 0.001,
    # Decimal('0.001'),

    # 1,
    # 1.0,
    # Decimal('1'),

    # 10,
    # 10.0,
    # Decimal('10'),

    # 100,
    # 100.0,
    # Decimal('100'),

    # 1234512345123451234512345,
    # 1234512345123451234512345.0,
    # Decimal('1234512345123451234512345'),
)

precision = (
    0,
    1,
    2,
    3,
    # 4,
    # 5,
    # 6,
    # 7,
    # 8,
    # 9,
    # 10,
    30,
    # 100,
    # 1000,
    # 10000,
)

results = [
    ((expr := t.format(v=repr(v), p=p)), eval(expr))
    for t in templates
        for v in values
            for p in precision
]

max_len_expr = max(len(expr) for expr, val in results)
for expr, val in results:
    print(f"{expr}{' ' * ((max_len_expr)-len(expr))}  {repr(val)}")

Linked PRs

  • gh-142084
  • gh-142813
  • gh-142814

1nftf avatar Nov 27 '25 17:11 1nftf

Decimal instances differ from floats in that they have significant trailing zeros. For example, Decimal("10"), Decimal("10.0") and "Decimal("1e1") are distinct (although equal under ==) and the difference affects the output of some operations. So the base exponent of a Decimal instance is an extra piece of information beyond just the numerical value.

To the extent that it's possible, formatting tries to preserve this extra information. That's the reason for the behaviour you're seeing here.

>>> x = Decimal('0')
>>> s = format(x, '.2e')
>>> s
'0.00e+2'
>>> y = Decimal(s)
>>> x.as_tuple() == y.as_tuple()
True

Above we've reconstructed y from its formatted representation, and the reconstructed y has exactly the same internal representation as x had. If s were "0.00e+0" instead, we would have lost the exponent information.

In addition to the above behavior, the exponent is not padded to two digits, which also makes it inconsistent with the built-in types.

This is a separate issue: I'd recommend opening a separate tracker issue if you want to pursue this. But this behaviour is mandated by the specification that the decimal module implements, so changing it would be a tough sell. From the spec (emphasis mine):

this comprises the letter ‘E’ followed immediately by the adjusted exponent converted to a character form. The latter is in base ten, using the characters 0 through 9 with no leading zeros, always prefixed by a sign character

mdickinson avatar Nov 28 '25 10:11 mdickinson

Note that the spec itself does not prescribe how .e-style formatting should work: it describes two operations: to-scientific-string, which roughly corresponds to g-style formatting (but without precision control), and to-engineering-string, which is like e-style formatting (again with no precision control) but with the displayed exponent constrained to be a multiple of 3.

So there's some reading-between-the-lines necessary to implement full-fledged float-style formatting from Decimal.

But the spirit of preserving the exponent information is present in the to-engineering-string description. E.g., in the to-engineering-string specification:

if the number is a zero, the zero will have a decimal point and one or two trailing zeros added, if necessary, so that the original exponent of the zero would be recovered by the to-number conversion.

mdickinson avatar Nov 28 '25 11:11 mdickinson

Decimal represents not only an approximate value, but also its precision. Decimal('0') is not the same as Decimal('0.000'). The former represents a value between Decimal('-0.5') and Decimal('0.5'), the latter -- between Decimal('-0.0005') and Decimal('0.0005'). Formatting tries to preserve this property.

>>> format(Decimal('0'), ".2E")
'0.00E+2'
>>> Decimal('0.00E+2')
Decimal('0')
>>> format(Decimal('0.000'), ".2E")
'0.00E-1'
>>> Decimal('0.00E-1')
Decimal('0.000')

So this is not a bug, but a feature.

serhiy-storchaka avatar Nov 28 '25 11:11 serhiy-storchaka

This issue is more about formatting values with the e/E symbol (with a given precision), rather than converting the value to a scientific or engineering string.

When converting/serializing, you might want to keep as much information as possible.

When formatting, you usually intend to display them in a uniform format.

# These
format(Decimal('0.0'), ".3E")    '0.000E+2'
f"{Decimal('0.0'):.3E}"          '0.000E+2'
"{:.3E}".format(Decimal('0.0'))  '0.000E+2'
"%.3E" % (Decimal('0.0'),)       '0.000E+00'  # the Decimal is implicitly converted to a float

# Instead of these
Decimal('0.0').to_eng_string())             '0.0'
getcontext().to_eng_string(Decimal('0.0'))  '0.0'
getcontext().to_sci_string(Decimal('0.0'))  '0.0'

For users who are not familiar with the Decimal class (like me), it might be expected that Decimal behaves similarly to float. "0.000E+2" rather than "0.000E+00" is indeed a bit strange, and it took me a little time to find the reason.

Documents compare

These two documents only define how to convert decimal floating-point numbers to strings, not how to format them.

  • https://speleotrove.com/decimal/daconvs.html
  • https://docs.python.org/3/library/decimal.html

The formatting behavior is defined in these documents.

  • https://docs.python.org/3/library/string.html#format-specification-mini-language
  • https://en.cppreference.com/w/c/io/fprintf (I think printf behavior can be seen as the de facto standard for c style formatting.)

The cpp document requires the exponent to be 0 when the value is 0, and requires the exponent to be padded to two digits. These is also the behavior of python float.

However the python document omitted above requirements. I'm not sure if this is intentionally relaxed for Decimal, or the Decimal.__format__ implementer only followed the simplified python document.

Proposal

My opinion is that we can add a more detailed description of the different behavior.

~~But if possible, we can make Decimal's format behavior more similar to float, and then update the documentation to describe the printf-like behavior.~~


P.S. I never said this is a bug. The tag I gave this issue from the beginning was #docs, not #type-bug .

1nftf avatar Nov 28 '25 14:11 1nftf

I think this might be closed.

Though, maybe new issue should be opened, about minimum number of digits in the exponent for 'e'/'g' formats. I don't think that the decimal spec forbids here using the printf-style formatting also for Decimal's. The Decimal constructor also accepts padding zeros in the exponent. The only question is: does it worth a backward compatibility break?

skirpichev avatar Nov 29 '25 22:11 skirpichev

But a non-zero exponent will make the padding less useful. And the '%' formatting will yield a different result due to implicit conversion to float.

>>> format(1 / Decimal('inf'), ".2e")
'0.00e-1000024'
>>> "%.2e" % (1 / Decimal('inf'))
'0.00e+00'

1nftf avatar Nov 30 '25 06:11 1nftf

If current Decimal format behavior is considered a feature and will not change in future. A description like these can be added.

When the value is equal to zero, the exponent is always zero for float, and internal exponent plus number of digits after decimal point for Decimal.

1nftf avatar Nov 30 '25 09:11 1nftf

Comment posted on the wrong issue

I discovered the formatting difference while investigating #142019

I found this document first: https://docs.python.org/3/library/string.html#format-specification-mini-language.

When I tried to find another document, I searched for '% format' and '% formatting' but found nothing related to https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting.

I misunderstood that there is only one document about formatting, and that the basic functionalities of the format-specification mini-language are also applicable to % formatting, except for some additional functions.

Perhaps a note could be added at the beginning of https://docs.python.org/3/library/string.html#format-specification-mini-language stating that this is not suitable for % formatting, along with a link back to it."

1nftf avatar Nov 30 '25 22:11 1nftf

Comment posted on the wrong issue

I think linking to the original instead of rewriting is almost always a better solution.

Writing the same thing multiple times makes it easy to forget to synchronize changes later.

1nftf avatar Nov 30 '25 22:11 1nftf

@picnixz, do you think this does make any sense?

Maybe changing exponent formatting (padding to two digits)? But this is a compatibility break without clear benefits: I think that in most cases people will not mix Decimal's and floats in formatting code.

Though, maybe we should document this tiny difference in docs for 'e' formatting type.

skirpichev avatar Nov 30 '25 22:11 skirpichev

I don't know. I'm too sleepy to think about this. Honestly, I think the decimal docs are already burdened with enough details that this kind of information would be burried and still not useful. format and %-style behave differently. So we shouldn't necessarily expect the same output. If you want to document the difference, feel free to do so, but I wouldn't change the behavior unless there is a good reason to. People might rely on this inconsitency because they want implicit float conversion or because they don't care about the possible loss! So I would rather be against a behavior change.

picnixz avatar Nov 30 '25 23:11 picnixz

I'm too sleepy to think about this.

Oh, sorry to bother you.

If you want to document the difference, feel free to do so, but I wouldn't change the behavior unless there is a good reason to.

Lets do this. @1nftf, could you please adjust your pr to simply note that "For float the exponent always contains at least two digits." for 'e' presentation type?

skirpichev avatar Nov 30 '25 23:11 skirpichev

~~I think I can complete this part and make it unambiguous.~~

~~I've already spent a lot of time on it.~~

As current observed behavior (only checked on windows, most of x64 releases, some of x86 releases):

  • If float value equal to 0, the exponent is also ​0​
  • If Decimal value equal to 0, the exponent might not be ​0​
  • float will padding exponent to 2digits
  • Decimal always have not padding exponent

1nftf avatar Nov 30 '25 23:11 1nftf

~~After figuring all this out, I will create a PR and highlight the key routines.~~

~~I have little to no experience with real-world C projects, thus need an experienced contributor to help me review it.~~

1nftf avatar Nov 30 '25 23:11 1nftf

I will create a PR and highlight the key routines.

Please don't. Could you just modify your pr to include a simple note, as suggested above: "For float the exponent always contains at least two digits."?

skirpichev avatar Dec 01 '25 00:12 skirpichev

This is what most users would expect:

For float, the exponent always contains at least two digits.

I would like to add description like this:

For a given Decimal object obj, if obj == 0, the exponent will be obj.as_tuple()[2] + p instead of always being 0.

@skirpichev If you’re on board with this, I’ll go ahead and change the PR.

1nftf avatar Dec 01 '25 14:12 1nftf

See this comment for aggregated information.

Sorry for my verbose comment and bad layout and some mistakes, made this issue a little off topic.

I have already made a lots of fold and revision, to make more easy to read.

1nftf avatar Dec 01 '25 15:12 1nftf

For a given Decimal object obj, if obj == 0, the exponent will be obj.as_tuple()[2] + p instead of always being 0.

I'm not sure if this is a good idea. There is nothing special in handling zero for Decimals, as it was explained above. Maybe: "For float the exponent always contains at least two digits and it's zero if the value is zero."

skirpichev avatar Dec 05 '25 06:12 skirpichev