rust icon indicating copy to clipboard operation
rust copied to clipboard

Fix inconsistent rounding of 0.5 when formatted to 0 decimal places

Open ajtribick opened this issue 1 year ago • 5 comments

As described in #70336, when displaying values to zero decimal places the value of 0.5 is rounded to 1, which is inconsistent with the display of other half-integer values which round to even.

From testing the flt2dec implementation, it looks like this comes down to the condition in the fixed-width Dragon implementation where an empty buffer is treated as a case to apply rounding up. I believe the change below fixes it and updates only the relevant tests.

Nevertheless I am aware this is very much a core piece of functionality, so please take a very careful look to make sure I haven't missed anything. I hope this change does not break anything in the wider ecosystem as having a consistent rounding behaviour in floating point formatting is in my opinion a useful feature to have.

Resolves #70336

ajtribick avatar Oct 11 '22 21:10 ajtribick

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

rust-highfive avatar Oct 11 '22 21:10 rust-highfive

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

rustbot avatar Oct 11 '22 21:10 rustbot

I've run a comparison for the formats {:.0} {:.1} and {:.9} for all positive finite f32 values, the only example affected was 0.5 in format {:.0}.

ajtribick avatar Oct 14 '22 19:10 ajtribick

@rustbot label +T-libs-api -T-libs

ajtribick avatar Oct 17 '22 17:10 ajtribick

Quick demonstration that .0 is rounding odd while the other numbers of decimal places are rounding even:

[src/main.rs:2] format!("{:.0}", 0.5) = "1"
[src/main.rs:3] format!("{:.1}", 0.25) = "0.2"
[src/main.rs:4] format!("{:.1}", 0.75) = "0.8"
[src/main.rs:5] format!("{:.2}", 0.125) = "0.12"
[src/main.rs:6] format!("{:.2}", 0.875) = "0.88"
[src/main.rs:7] format!("{:.3}", 0.0625) = "0.062"

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=a0ee562de13b361cc6a729d64e482ae8

scottmcm avatar Oct 20 '22 18:10 scottmcm

And just to show that it is specifically the behaviour of 0.5 that is the issue here:

[src/main.rs:2] format!("{:.0}", 0.5) = "1"
[src/main.rs:3] format!("{:.0}", 1.5) = "2"
[src/main.rs:4] format!("{:.0}", 2.5) = "2"
[src/main.rs:5] format!("{:.0}", 3.5) = "4"
[src/main.rs:6] format!("{:.0}", 4.5) = "4"
[src/main.rs:7] format!("{:.0}", 5.5) = "6"

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=68c6c78969ae752c38e5ffcd769eabd4

ajtribick avatar Oct 20 '22 18:10 ajtribick

Oh! I hadn't realized it was just 0.5 with the problem. That pushes me to "this is just a bug", not really a decision to be made. (Still worth a relnotes mention, but might not even need an FCP.)

@rust-lang/libs-api would you be comfortable calling this a bug that can just be fixed, or do you need an FCP for it? @rustbot label +I-libs-api-nominated

scottmcm avatar Oct 20 '22 19:10 scottmcm

Quick demonstration that .0 is rounding odd while the other numbers of decimal places are rounding even:

[src/main.rs:2] format!("{:.0}", 0.5) = "1"
[src/main.rs:3] format!("{:.1}", 0.25) = "0.2"
[src/main.rs:4] format!("{:.1}", 0.75) = "0.8"
[src/main.rs:5] format!("{:.2}", 0.125) = "0.12"
[src/main.rs:6] format!("{:.2}", 0.875) = "0.88"
[src/main.rs:7] format!("{:.3}", 0.0625) = "0.062"

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=a0ee562de13b361cc6a729d64e482ae8

I'm sorry if I'm misunderstanding, I found a lot of other values, as shown in #70336 (for "to even" as well as "away from zero"). I was originally thinking it was supposed to round away from zero, but if fmt is supposed to round to even, I can adapt my test to hunt all the values down. Unless you're talking about the fix?

For example:

[src/main.rs:9] format!("{:.1}", 0.35) = "0.3"
[src/main.rs:10] format!("{:.2}", 0.155) = "0.15"
[src/main.rs:11] format!("{:.2}", 0.165) = "0.17"

EDIT: testing all the values at different depth (with a '5' at depth+1), I found about 50% of wrong rounding in the original fmt, no matter which mode (slightly fewer errors if we expect even but it's negligible). I haven't tested more complicated cases.

blueglyph avatar Oct 25 '22 13:10 blueglyph

[src/main.rs:9] format!("{:.1}", 0.35) = "0.3" [src/main.rs:10] format!("{:.2}", 0.155) = "0.15" [src/main.rs:11] format!("{:.2}", 0.165) = "0.17"

Remember that anything that's not a multiple of an integer power of two doesn't really exist in floating-point numbers.

0.35_f64 is the most convenient way to write the floating point value that's exactly

0.34999999999999997779553950749686919152736663818359375​

And thus it rounds down https://float.exposed/0x3fd6666666666666

Similarly, 0.155 and 0.165 are

0.1549999999999999988897769753748434595763683319091796875​ 0.1650000000000000077715611723760957829654216766357421875​

And thus they're not actually ties either.

It makes sense that you'd get about 50% of them rounding the other way, because for things that aren't representable exactly, you'd expect about 50% of them to be slightly too big and 50% of them to be slightly too small.

(I also made this mistake originally, but realized in time, and thus my examples above all use fractions where the denominator is a power of two -- ½, ¼, ¾, ⅛, ⅞, 1⁄16.)

Edit: Oh, hi @ajtribick, apparently we were racing on this one 😅

scottmcm avatar Oct 25 '22 16:10 scottmcm

@scottmcm @ajtribick You're right that they are not round values, but I thought it was precisely the problem solved by those algorithms. I'm not familiar enough with Grisu3/Dragon4 but if I take the decoded significand and base-10 exponent in the case of Dragonbox, I get those (u64, i32) values:

0.35  => sig=3500000000000000, exp=-16
0.155 => sig=1550000000000000, exp=-16
0.175 => sig=1750000000000000, exp=-16

I suppose the current algorithms operate differently. Yet it's strange that it fails on 50% of the ties, that's a lot.

blueglyph avatar Oct 25 '22 17:10 blueglyph

@blueglyph - as I understand it, Dragonbox is an algorithm for producing the shortest roundtrippable representation of a floating point value (i.e. the operation corresponding to the {} format string: without a specified precision), not a floating point value rounded to a specified number of decimal places. This is a different operation and not necessarily the same as what you would get by operating on the decimal representation produced by the shortest representation algorithm.

FWIW charconv in MSVC behaves in the same way:

#include <array>
#include <charconv>
#include <iostream>
#include <string_view>
#include <system_error>

int main()
{
    std::array<double, 7> values = { 0.5, 0.25, 0.75, 0.125, 0.35, 0.155, 0.165 };
    std::array<char, 10> buffer{ 0 };

    for (double value : values)
    {
        if (auto [ptr, ec] = std::to_chars(buffer.data(), buffer.data() + buffer.size(), value, std::chars_format::fixed); ec == std::errc{})
        {
            std::cout << std::string_view(buffer.data(), ptr - buffer.data());
        }
        else
        {
            std::cout << "(FAIL)";
        }

        for (int precision = 0; precision < 3; ++precision)
        {
            if (auto [ptr, ec] = std::to_chars(buffer.data(), buffer.data() + buffer.size(), value, std::chars_format::fixed, precision); ec == std::errc{})
            {
                std::cout << '\t' << std::string_view(buffer.data(), ptr - buffer.data());
            }
            else
            {
                std::cout << "\t(FAIL)";
            }
        }
        std::cout << '\n';
    }
}

gives the following output (first column is shortest representation, followed by fixed precisions of 0, 1, 2 decimal places):

0.5     0       0.5     0.50
0.25    0       0.2     0.25
0.75    1       0.8     0.75
0.125   0       0.1     0.12
0.35    0       0.3     0.35
0.155   0       0.2     0.15
0.165   0       0.2     0.17

ajtribick avatar Oct 25 '22 18:10 ajtribick

@blueglyph Note that floating-point algorithms are expected to treat the floating point number as if the represented value was exactly the desired value, and not to attempt to divine whether it came from rounding something else.

That's why, for example, it's absolutely correct that PI.sin() is not zero. PI is very close to π, but is not exactly π. So "turn it into a rounded string" also follows those rules, and even if there's a tie in the ±½ULP range around the exact floating point number, that doesn't mean it follows the tie rules. It's only a tie if the floating-point number itself is exactly on a tie.

scottmcm avatar Oct 25 '22 18:10 scottmcm

@blueglyph - as I understand it, Dragonbox is an algorithm for producing the shortest roundtrippable representation of a floating point value (i.e. the operation corresponding to the {} format string: without a specified precision), not a floating point value rounded to a specified number of decimal places. This is a different operation and not necessarily the same as what you would get by operating on the decimal representation produced by the shortest representation algorithm.

Dragonbox is an algorithm to convert a binary floating-point value into a correctly rounded decimal representation. From there I don't see what you wouldn't be able to reduce the precision as you wish. But it is arguably another algorithm than the one used.

blueglyph avatar Oct 25 '22 20:10 blueglyph

@blueglyph Note that floating-point algorithms are expected to treat the floating point number as if the represented value was exactly the desired value, and not to attempt to divine whether it came from rounding something else.

I would agree for operations on floating-points (*), but I'm not convinced it's true for their decimal representation, since we know the base 2 representation is not exactly the desired value, as we saw earlier.

(*) although they are rounded to even to prevent bias errors, so in a way we are already "attempting to divine" whether they should be pre-processed in the case of accumulation.

That's why, for example, it's absolutely correct that PI.sin() is not zero. PI is very close to π, but is not exactly π. So "turn it into a rounded string" also follows those rules, and even if there's a tie in the ±½ULP range around the exact floating point number, that doesn't mean it follows the tie rules. It's only a tie if the floating-point number itself is exactly on a tie.

PI is an irrational number, so the problem is different, and we're not applying the perfect sin function either but only an approximation. But I see what you mean by that.

blueglyph avatar Oct 25 '22 20:10 blueglyph

Dragonbox is an algorithm to convert a binary floating-point value into a correctly rounded decimal representation. From there I don't see what you wouldn't be able to reduce the precision as you wish. But it is arguably another algorithm than the one used.

The output of Dragonbox is "correctly rounded" in the sense that it gives the shortest possible decimal representation of the value that can be parsed to give back the exact same input. It does not produce the exact representation of the numerical value of the variable. Note that rounding twice is not equivalent to rounding once, e.g. in decimal mathematics:

  • 1.47 rounded to 1 decimal place is 1.5, 1.5 rounded to 0 decimal places is 2.
  • 1.47 rounded to 0 decimal places is 1.

Reducing the precision of the Dragonbox output is likewise performing double rounding, and will lead to different results than rounding the value directly. The value of 0.35_f64 is less than 0.35 (once the program is compiled it cannot tell whether you wrote 0.35_f64 or 0.34999999999999997779553950749686919152736663818359375_f64 in the source code), and thus should be rounded down when rounding to 1 decimal place.

In any case, I fear this is going off on an irrelevant tangent to the change being done here: the value of 0.5 is exactly representable as both f32 and f64, and is being rounded inconsistently with every other situation where an exact tie occurs.

ajtribick avatar Oct 25 '22 21:10 ajtribick

but I'm not convinced it's true for their decimal representation, since we know the base 2 representation is not exactly the desired value, as we saw earlier.

I'm not sure what you mean by decimal representation here.

Do you agree that this is correct?

format!("{:.1}", 3152519739159347.0 / 9007199254740992.0) => "0.3"

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=57517cb288ef19ff42707ea1e866e4dc

scottmcm avatar Oct 25 '22 22:10 scottmcm

I'm sorry this is going out of the initial topic, but this claims to solve #70336 and my argument is about that part. If more appropriate, it could be discussed elsewhere.

Grisu, like Dragonbox, aim to convert a binary-coded floating points to a decimal representation which is correct (in the sense mentioned earlier, the parsed decimal representation yields the same binary coding). Here the coding is typically IEEE-754, with a current focus on double precision.

The above conclusion "thus should be rounded down when rounding to 1 decimal place" seems strange to me, because there is no way to assert that.

For example, 0.35 is coded like this:

6305039478318694*2^-54 or 0x16666666666666*2^0xffca => 0.34999999999999997779553950749686919152736663818359375

The value has been rounded to the 54-bit normalized mantissa using the round-even method, and indeed it's not a tie anymore (in decimal representation). If I increase the mantissa, I get this value:

6305039478318695*2^-54 or 0x16666666666667*2^0xffca => 0.350000000000000033306690738754696212708950042724609375

which is on the other side of the tie. So we see here that the error introduced when parsing "0.35" and initially rounding the value to code it as an f64 doesn't allow us to deduce on which side of the tie the initial value was.

Thus the operation that Dragonbox (or Grisu, Ryu, etc) does to minimize the output length while preserving the information and keeping a "correct" output, is not wrong. And it looks like it's more likely to be what the initial value was when it matters:

  • if the initial value was 0.35, that is what we get
  • if the initial value was 0.349999999999999978, likely as a result of a calculation, does it matter if the displayed value is "0.35"? No, because the error margin is the same. Likewise for rounding as "0.4" with {:.1} by assuming it was a tie.

At least that's how I understand the asymmetric problem of coding and displaying floating-point values respectively in base 2 and 10.

If we come back to the initial problem to solve, the current solution seems to be wrong in about 50% of the cases, which is a lot. So I'm only wondering whether there could be another problem than just the case of 0.5, but maybe in the scope of another code change.

I'll stop here, however, and sorry again if that was not the place.

I'm not sure what you mean by decimal representation here.

Do you agree that this is correct?

format!("{:.1}", 3152519739159347.0 / 9007199254740992.0) => "0.3"

I mean by that the decimal interpretation of the value which is coded in binary (IEEE-754). How the value is printed, if you will. Hopefully the first part of my post already clarified that. :)

I don't see the point of the last question.

blueglyph avatar Oct 26 '22 08:10 blueglyph

I'm sorry this is going out of the initial topic, but this claims to solve #70336 and my argument is about that part. If more appropriate, it could be discussed elsewhere.

I claim this issue solves #70336 because it addresses the issue as described in the very first post there: the rounding of half-integer values follows round-ties-to-even rules except for the case of 0.5.

If you need to have 0.35 behave as exactly halfway between 0.3 and 0.4, finite-length binary floating point types like f32 and f64 are not appropriate since they cannot represent these values, you should use a decimal floating point type instead.

ajtribick avatar Oct 26 '22 16:10 ajtribick