fast_float Change smallest_power_of

18446744073709551615e-65 rounds to 0, and it is the largest possible value with exponent -65. Therefore, we can assume the minimal exponent for which we don't round to zero actually is -64.

Jan 10 '23 20:01 deadalnix

Thanks !!!

Jan 10 '23 21:01 lemire

Merge conflict resolved.

Jul 10 '23 12:07 deadalnix

I rebased on top of the right branch, sorry for the confusion. Is it good to merge? if not, what's missing?

Jul 12 '23 14:07 deadalnix

@deadalnix I don't think that there was necessarily a problem with the PR... it did not seem necessary to sync. I do need to carefully check the changes to make sure that we don't introduce bugs. I expect to eventually merge this PR.

Jul 12 '23 17:07 lemire

@lemire where do these values come from? I am trying to figure out a way to determine them programatically.

largest_power_of_ten seems to be floor(exponent_bias / log2(10)), which returns 38 for f32 and 308 for f64. This makes sense (can't represent an exponent larger than the max binary exponent) and does line up with the literals in this repo.

smallest_power_of_ten makes less sense. I think that -(exponent_bias + mantissa_bits)/log2(10) (possibly +/- an off by one error somewhere). Which seems to make sense because the smallest positive exponent should be the minimum exponent plus the additional places won by representing a subnormal, converted from base2 to base10. However, this gives values of -45 and -323; way off from -65 and -342 present in this repository.

My calculations do seem to align with http://weitz.de/ieee/ from a few spot checks, e.g. 1e-45 is representable as a f32 (0x1), but 1e-46 gets rounded to 0.

Edit: but there seems to be some kind of ripple effect in the algorithm, even though the math looks right at a surface level. Testing in the Rust stdlib, changing LARGEST_POWER_OF_TEN to -((Self::EXPONENT_BIAS + Self::MANTISSA_BITS) as f64) / LOG2_10) as i32 - 1 produces the constants -46 (f32) and -324 (f128), which seems like it would be correct based on the direct input->output relationship, but must have some side effect based on the actual output. Failures after changing:

f64 from "2.225073858507201136057409796709131975934819546351645648023426109724822222021076945516529523908135087914149158913039621106870086438694594645527657207407820621743379988141063267329253552286881372149012981122451451889849057222307285255133155755015914397476397983411801999323962548289017107081850690630666655994938275772572015763062690663332647565300009245888316433037779791869612049497390377829704905051080609940730262937128958950003583799967207254304360284078895771796150945516748243471030702609144621572289880258182545180325707018860872113128079512233426288368622321503775666622503982534335974568884423900265498198385487948292206894721689831099698365846814022854243330660339850886445804001034933970427567186443383770486037861622771738545623065874679014086723327636718749999999999999999999999999999999999999e-308". should be 0x0001999999999999 but rounds to 0. Note that -308 > -324 so it shouldn't be hitting the zero fast path, but is rounding to 0 for some reason.
f32 from "1.175494140627517859246175898662808184331245864732796240031385942718174675986064769972472277004271745681762695312500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e-38". Should be 0x007FFFFE but rounds to 0. Again, -38 < -45 so it shouldn't round, but it hits a zero fast path somewhere

Changing LARGEST_POWER_OF_TEN to the expression I mentioned above seems to work okay, it is only changing the smallest that fails things.

My test branch of generic-ifying things https://github.com/tgross35/rust/blob/c0d27fa108a2215bdbcdb2fe1e5d89a62ded51ac/library/core/src/num/dec2flt/float.rs#L89-L100, setting the SMALLEST_POWER_OF_TENs back to the constants makes things work.

Jun 27 '24 00:06 tgross35

@tgross35

where do these values come from?

Sure. Please see the papers where this is discussed in details.

Fast Number Parsing Without Fallback, Software: Practice and Experience 53 (7), 2023
Number Parsing at a Gigabyte per Second, Software: Practice and Experience 51 (8), 2021

The Python script is part of this repository.

Jun 27 '24 15:06 lemire

The paper does say:

We have that w × 10⁻³⁴³ < 2⁻¹⁰⁷⁴ for all w < 2^⁶⁴. Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal exponent is greater than 308, the result must be infinite (beyond the range).

But it doesn't mention the significance of these numbers, at least that I can find. 1074 is bias + mantissa bits and 64 seems to be the same constant for both f32 and f64. My guess is that is to account for possible shift within the internal u64 exponent representation.

Going off of that:

q_min,shifted = -⌊(exponent_bias + mantissa_bits + internal_rep_bits) / log2(10)⌋
q_max,shifted = ⌊(exponent_bias + 1) / log2(10)⌋

Which gives the following:

-26..4 for f16. Largest (4) matches https://github.com/fastfloat/fast_float/issues/148, but the smallest value there (-27) is one less than the calculated smallest (-26)
-59..38 for f16. Largest (38) matches https://github.com/fastfloat/fast_float/issues/148, but the smallest value there (-60) is one less than the calculated smallest (-59)
-64..38 for f32. Largest (38) matches what is in this repo. Smallest (-64) matches what is in this PR
-342..307 for f64. Largest (308) and smallest (-342) both match what is in this repo.

I have not tested against the f16 or bf16 algorithms. Does what is proposed here seem accurate?

(cc @jakubjelinek from #148, how did you come up with the values there? And is it possible that -26 and -59 could be the correct min powers of 10?)

Jun 27 '24 18:06 tgross35

We have that w × 10−343 < 2−1074 for all w < 2^64. Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal exponent is greater than 308, the result must be infinite (beyond the range).

But it doesn't mention the significance of these numbers, at least that I can find.

I am not sure what you mean by "doesn't mention the significance of these numbers".

Let me be explicit:

10**309 is beyond the range of binary64 and is thus infinite. From the paper:
2**-1074 is effectively zero in binary64. From the paper:

Does what is proposed here seem accurate?

Can you please explicitly relate your comments to @deadalnix 's PR.

Jun 27 '24 21:06 lemire

I do see the numbers in the paper, but I do not see a derevation or relationship with {largest,smallest}_power_of_ten, or a way to calculate what those numbers are based on properties. The bit you cite says that the smallest positive value for binary64 is 2^-1074 = 2^{bias + mantissa_bits - 1}, that part makes sense.

The problem: ceil(log10(2^-1074)) is ~-324, which should be the "smallest power of 10". However, smallest_power_of_ten in this repository, the python file, and the paper, is the magic number -342. Which I have not seen explained.

I am trying to explain mathematically why this is specifically -342, not -324 or -341 or -343 or anything else. "10e-341" rounds to 0, as does everything smaller until ~"10e-324".

So I am working backwards to try to understand this. As I quoted above from "Fast Number Parsing":

We have that w × 10⁻³⁴³ < 2⁻¹⁰⁷⁴ for all w < 2^⁶⁴. Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal exponent is greater than 308, the result must be infinite (beyond the range).

This allows us to arrive at -⌊(exponent_bias + mantissa_bits + magic_number_64) / log2(10)⌋. Because magic_number_64 needs to be the same for both f32 and f64 (i.e. it is not bit width, plugging in "32" provides an inaccurate answer for f32), I am suggesting that magic_number_64 may be the width of the internal exponent repr (a uint64_t in https://github.com/fastfloat/fast_float/blob/1fc3ac3932220b5effaca7203bb1bb771528d256/include/fast_float/decimal_to_binary.h#L94)

The relationship to this PR is that it should be trivial to verify the suggested change if such a derevation of the magic number exists. My above formula can produce the number in this PR, saying the change should be correct; I am asking if my formula is correct.

The relation to everything else is aiding in applying this algorithm new float types that are not specified in the paper. E.g. if my above formula is indeed correct, then GCC may have an incorrect (but still functional) smallest_power_of_ten for f16 and bf16.

Jun 27 '24 21:06 tgross35

@tgross35 If you believe that there is a bug in GCC, please report the issue with GCC. At this time, fast_float does not yet support 16-bit floats. This pull request by @deadalnix is not related to 16-bit floats.

Your comments are misplaced. Please take them where they belong.

Let me be clear. This PR is not about 16-bit floats and we are not currently supporting 16-bit floats.

GCC does support 16-bit floats, but that is the wrong place to discuss issues with GCC. Please raise the issue where it belongs.

Jun 27 '24 23:06 lemire

@deadalnix Merged. Thanks.

Jun 27 '24 23:06 lemire

Change smallest_power_of_ten to -64 for floats.