Change smallest_power_of_ten to -64 for floats.
18446744073709551615e-65 rounds to 0, and it is the largest possible value with exponent -65. Therefore, we can assume the minimal exponent for which we don't round to zero actually is -64.
Thanks !!!
Merge conflict resolved.
I rebased on top of the right branch, sorry for the confusion. Is it good to merge? if not, what's missing?
@deadalnix I don't think that there was necessarily a problem with the PR... it did not seem necessary to sync. I do need to carefully check the changes to make sure that we don't introduce bugs. I expect to eventually merge this PR.
@lemire where do these values come from? I am trying to figure out a way to determine them programatically.
largest_power_of_ten seems to be floor(exponent_bias / log2(10)), which returns 38 for f32 and 308 for f64. This makes sense (can't represent an exponent larger than the max binary exponent) and does line up with the literals in this repo.
smallest_power_of_ten makes less sense. I think that -(exponent_bias + mantissa_bits)/log2(10) (possibly +/- an off by one error somewhere). Which seems to make sense because the smallest positive exponent should be the minimum exponent plus the additional places won by representing a subnormal, converted from base2 to base10. However, this gives values of -45 and -323; way off from -65 and -342 present in this repository.
My calculations do seem to align with http://weitz.de/ieee/ from a few spot checks, e.g. 1e-45 is representable as a f32 (0x1), but 1e-46 gets rounded to 0.
Edit: but there seems to be some kind of ripple effect in the algorithm, even though the math looks right at a surface level. Testing in the Rust stdlib, changing LARGEST_POWER_OF_TEN to -((Self::EXPONENT_BIAS + Self::MANTISSA_BITS) as f64) / LOG2_10) as i32 - 1 produces the constants -46 (f32) and -324 (f128), which seems like it would be correct based on the direct input->output relationship, but must have some side effect based on the actual output. Failures after changing:
- f64 from "2.225073858507201136057409796709131975934819546351645648023426109724822222021076945516529523908135087914149158913039621106870086438694594645527657207407820621743379988141063267329253552286881372149012981122451451889849057222307285255133155755015914397476397983411801999323962548289017107081850690630666655994938275772572015763062690663332647565300009245888316433037779791869612049497390377829704905051080609940730262937128958950003583799967207254304360284078895771796150945516748243471030702609144621572289880258182545180325707018860872113128079512233426288368622321503775666622503982534335974568884423900265498198385487948292206894721689831099698365846814022854243330660339850886445804001034933970427567186443383770486037861622771738545623065874679014086723327636718749999999999999999999999999999999999999e-308". should be
0x0001999999999999but rounds to 0. Note that -308 > -324 so it shouldn't be hitting the zero fast path, but is rounding to 0 for some reason. - f32 from
"1.175494140627517859246175898662808184331245864732796240031385942718174675986064769972472277004271745681762695312500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e-38". Should be0x007FFFFEbut rounds to 0. Again,-38<-45so it shouldn't round, but it hits a zero fast path somewhere
Changing LARGEST_POWER_OF_TEN to the expression I mentioned above seems to work okay, it is only changing the smallest that fails things.
My test branch of generic-ifying things https://github.com/tgross35/rust/blob/c0d27fa108a2215bdbcdb2fe1e5d89a62ded51ac/library/core/src/num/dec2flt/float.rs#L89-L100, setting the SMALLEST_POWER_OF_TENs back to the constants makes things work.
@tgross35
where do these values come from?
Sure. Please see the papers where this is discussed in details.
- Fast Number Parsing Without Fallback, Software: Practice and Experience 53 (7), 2023
- Number Parsing at a Gigabyte per Second, Software: Practice and Experience 51 (8), 2021
The paper does say:
We have that w × 10−343 < 2−1074 for all w < 2^64. Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal exponent is greater than 308, the result must be infinite (beyond the range).
But it doesn't mention the significance of these numbers, at least that I can find. 1074 is bias + mantissa bits and 64 seems to be the same constant for both f32 and f64. My guess is that is to account for possible shift within the internal u64 exponent representation.
Going off of that:
- qmin,shifted = -⌊(exponent_bias + mantissa_bits + internal_rep_bits) / log2(10)⌋
- qmax,shifted = ⌊(exponent_bias + 1) / log2(10)⌋
Which gives the following:
- -26..4 for f16. Largest (4) matches https://github.com/fastfloat/fast_float/issues/148, but the smallest value there (-27) is one less than the calculated smallest (-26)
- -59..38 for f16. Largest (38) matches https://github.com/fastfloat/fast_float/issues/148, but the smallest value there (-60) is one less than the calculated smallest (-59)
- -64..38 for f32. Largest (38) matches what is in this repo. Smallest (-64) matches what is in this PR
- -342..307 for f64. Largest (308) and smallest (-342) both match what is in this repo.
I have not tested against the f16 or bf16 algorithms. Does what is proposed here seem accurate?
(cc @jakubjelinek from #148, how did you come up with the values there? And is it possible that -26 and -59 could be the correct min powers of 10?)
We have that w × 10−343 < 2−1074 for all w < 2^64. Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal exponent is greater than 308, the result must be infinite (beyond the range).
But it doesn't mention the significance of these numbers, at least that I can find.
I am not sure what you mean by "doesn't mention the significance of these numbers".
Let me be explicit:
-
10**309is beyond the range of binary64 and is thus infinite. From the paper: -
2**-1074is effectively zero in binary64. From the paper:
Does what is proposed here seem accurate?
Can you please explicitly relate your comments to @deadalnix 's PR.
I do see the numbers in the paper, but I do not see a derevation or relationship with {largest,smallest}_power_of_ten, or a way to calculate what those numbers are based on properties. The bit you cite says that the smallest positive value for binary64 is 2-1074 = 2bias + mantissa_bits - 1, that part makes sense.
The problem: ceil(log10(2^-1074)) is ~-324, which should be the "smallest power of 10". However, smallest_power_of_ten in this repository, the python file, and the paper, is the magic number -342. Which I have not seen explained.
I am trying to explain mathematically why this is specifically -342, not -324 or -341 or -343 or anything else. "10e-341" rounds to 0, as does everything smaller until ~"10e-324".
So I am working backwards to try to understand this. As I quoted above from "Fast Number Parsing":
We have that w × 10−343 < 2−1074 for all w < 2^64. Thus if the decimal exponent is smaller than -342, then the number must be considered to be zero. If the decimal exponent is greater than 308, the result must be infinite (beyond the range).
This allows us to arrive at -⌊(exponent_bias + mantissa_bits + magic_number_64) / log2(10)⌋. Because magic_number_64 needs to be the same for both f32 and f64 (i.e. it is not bit width, plugging in "32" provides an inaccurate answer for f32), I am suggesting that magic_number_64 may be the width of the internal exponent repr (a uint64_t in https://github.com/fastfloat/fast_float/blob/1fc3ac3932220b5effaca7203bb1bb771528d256/include/fast_float/decimal_to_binary.h#L94)
The relationship to this PR is that it should be trivial to verify the suggested change if such a derevation of the magic number exists. My above formula can produce the number in this PR, saying the change should be correct; I am asking if my formula is correct.
The relation to everything else is aiding in applying this algorithm new float types that are not specified in the paper. E.g. if my above formula is indeed correct, then GCC may have an incorrect (but still functional) smallest_power_of_ten for f16 and bf16.
@tgross35 If you believe that there is a bug in GCC, please report the issue with GCC. At this time, fast_float does not yet support 16-bit floats. This pull request by @deadalnix is not related to 16-bit floats.
Your comments are misplaced. Please take them where they belong.
Let me be clear. This PR is not about 16-bit floats and we are not currently supporting 16-bit floats.
GCC does support 16-bit floats, but that is the wrong place to discuss issues with GCC. Please raise the issue where it belongs.
@deadalnix Merged. Thanks.