ryu
ryu copied to clipboard
Incorrect vrIsTrailingZeros?
In d2s.c, should this call
https://github.com/ulfjack/ryu/blob/00ecce8c59ab9bdb7b8f6c3dc54ee467cfab6b43/ryu/d2s.c#L315 use q instead of q-1?
As I understand it, the code uses q' = max(0, q-1) to make sure that the loop below is executed at least once, which is required to get a correct value for lastRemovedDigit which in turn is required to correctly round the result. Later, the values of vrIsTrailingZeros and lastRemovedDigit are used to determine if the exact number ends in ...5000...0 here: https://github.com/ulfjack/ryu/blob/00ecce8c59ab9bdb7b8f6c3dc54ee467cfab6b43/ryu/d2s.c#L378
Now, might it be possible that vrIsTrailingZeros is missing one digit, i.e.
missed digit?
v
...5?000...0
^ ~~~~~~^
lastRemovedDigit vrIsTrailingZeros
I haven't found a counter-example... so it's probably me, who is missing something...
NB: the e2 >= 0 branch is using multipleOfPowerOf5(mv, q).
Related: In f2s.c,
https://github.com/ulfjack/ryu/blob/00ecce8c59ab9bdb7b8f6c3dc54ee467cfab6b43/ryu/f2s.c#L195
Shouldn't this use q-1 instead of q? In this case vrIsTrailingZeros includes the value of the last removed digit, but shouldn't it only include the digits after the last removed digit?
lastRemovedDigit
v
...5000...0
~~~~~~~^
vrIsTrailingZeros
I have replaced the assignment above with
vrIsTrailingZeros = (q == 0) || multipleOfPowerOf5(mv, q - 1);
and tested all single-precision numbers: It doesn't affect the output. So, again, I'm probably missing something here...
NB: the e2 < 0 branch is using multipleOfPowerOf2(mv, q - 1) here.
The short answer is no.
We have three values vm, vr, and vp. For vm and vp, the code checks whether all q removed digits were 0, so it needs to check if they are multiples of 10^q. For vr, the code checks whether vr ends in 500...0, where these are the last q digits, so it only checks whether there are q-1 trailing 0s, i.e., whether vr is a multiple of 10^(q-1). Of course, it then also needs to check whether the qth digit is 5, which happens below.
The other branch should also use q - 1, although it's possible that there is a reason why it doesn't, except I can't remember it. If the current code is wrong, then it should be possible to construct a case by reasoning backwards from vr / mv to obtain a bit pattern that would cause a different outcome for this branch.
We have three values vm, vr, and vp. For vm and vp, the code checks whether all q removed digits were 0, so it needs to check if they are multiples of 10^q. For vr, the code checks whether vr ends in 500...0, where these are the last q digits, so it only checks whether there are q-1 trailing 0s, i.e., whether vr is a multiple of 10^(q-1). Of course, it then also needs to check whether the qth digit is 5, which happens below.
This seems to be the case for the single-precision implementation.
But in the double-precision case, we remove one digit less (q' = q - 1), so that the loop is executed at least once to get a correct value for lastRemovedDigit. So
vr = ? ? ? ? R X r r ... r
vm = ? ? ? ? z z z z ... z
vp = ? ? ? ? w w w w ... w
^ ^
q q'
Here R will be the lastRemovedDigit and we need to know whether [X r r ... r] are all zeros, i.e. whether vr is divisible by 10^(q-1), but the code uses 10^(q'-1) = 10^(q-2) and so, if all the rs are zero, but X is non-zero, vrIsTrailingZeros is wrong?
If the current code is wrong, then it should be possible to construct a case by reasoning backwards from vr / mv to obtain a bit pattern that would cause a different outcome for this branch.
I tried to do exactly that, but didn't get very far...
Ah, I should have noted that my q' is actually the same as q in the code. It might be confusing, sorry for that.
I do have a few double-precision test cases which do not correctly round if q-1 is used in the e2 >= 0 branch, i.e.
-- vrIsTrailingZeros = multipleOfPowerOf5(mv, q);
++ vrIsTrailingZeros = (q == 0) || multipleOfPowerOf5(mv, q - 1);
and the test cases:
2.0919495182368195e+19 => 2.0919495182368194e+19
2.6760179287532483e+19 => 2.6760179287532482e+19
3.2942957306323907e+19 => 3.2942957306323906e+19
3.9702293349085635e+19 => 3.9702293349085634e+19
4.0647939013152195e+19 => 4.0647939013152194e+19
It doesn't matter what the exact value of q is, we need to use q for vm and vp, and q-1 for vr. This is 'just' the base case, we then incrementally update the flags during the loop as we remove additional digits.
Let's say we remove n digits total. When we start, we don't know what the value of n is; however, we know that n >= q. So we remove q digits, and then compute some flags. If we then remove more digits (n > q), we update the flags in each step. At least, that's what we do in the general case - note that it's rare that the flags are true after removing q digits.
I'm sure there's a reason why we use q rather than q-1 in that case, but I can't say even after looking at the code and my own paper. :-/
Since you're bringing this up, I wonder if the code would be faster if we computed the flags after we know n rather than before. I have since also found a slightly different formulation of the check for exactly 0.5-ness which might lead to slightly simpler code.
It doesn't matter what the exact value of q is, we need to use q for vm and vp, and q-1 for vr.
and we need to know the value of the last removed digit.
But in the double-precision implementation, lastRemovedDigit is initialized to 0, which might be incorrect.
The correct value of lastRemovedDigit is then computed in the first loop. q' = q - 1 is chosen exactly because we need to have the correct value of lastRemovedDigit and vrIsTrailingZeros to correctly round the result and this choice makes sure that the loop is executed at least once.
So, for
vr = ? ? ? ? R X r r ... r
vm = ? ? ? ? z z z z ... z
vp = ? ? ? ? w w w w ... w
^ ^
q q'
the initial value of lastRemovedDigit is actually wrong if X != 0 (because we only test the last q' - 1 decimal digits) and consequently the value of lastRemovedDigit might be wrong in all subsequent iterations.
That being said, I'm a bit confused right now :-) and will take a look at your reasoning again tomorrow. I'm sure you are correct here.
If we know that n > q, and the loop will run at least once, then it's correct if we check q digits for 0, rather than q - 1, otherwise the number cannot end in 500...0, because that's n - 1 >= q zeroes.
If we know that n > q, and the loop will run at least once, then it's correct if we check q digits for 0, rather than q - 1, otherwise the number cannot end in 500...0, because that's n - 1 >= q zeroes.
I'm not sure what you mean by this.
But let me make one last attempt to explaing my reasoning here.
The current implementation of the double-precision version basically works like this: (The important part here is the initialization of lastRemovedDigit and the validity of the flags before and after the first iteration.)
if (e2 >= 0) {
vr, vm, vp = (mv, mm, mp) * 2^e2
q_prime = log10Pow2(e2)
q = max(0, q_prime - 1)
vrIsTrailingZeros = multipleOfPowerOf5(mv, q) // vr % 10^q == 0?
vmIsTrailingZeros = multipleOfPowerOf5(mm, q)
vpIsTrailingZeros = multipleOfPowerOf5(mp, q)
} else {
vr, vm, vp = (mv, mm, mp) * 5^(-e2)
q_prime = log10Pow5(-e2)
q = max(0, q_prime - 1)
vrIsTrailingZeros = multipleOfPowerOf2(mv, q - 1) // vr % 10^(q - 1) == 0?
vmIsTrailingZeros = multipleOfPowerOf2(mm, q)
vpIsTrailingZeros = multipleOfPowerOf2(mp, q)
}
vr /= 10^q // Remove q digits.
vm /= 10^q
vp /= 10^q
vp -= vpIsTrailingZeros
lastRemovedDigit = 0
// This loop is executed at least once if q_prime > 0
while (vm / 10 < vp / 10) {
vmIsTrailingZeros &= vm % 10 == 0
vrIsTrailingZeros &= lastRemovedDigit == 0
lastRemovedDigit = vr % 10
vr /= 10
vm /= 10
vp /= 10
}
[round...]
If the loop is never executed, then at [round...], lastRemovedDigit = 0 and vrIsTrailingZeros = true, which are the correct values.
Otherwise,
In the e2 < 0 branch: we check whether the last q - 1 decimal digits of vr are 0 and then remove q decimal digits from vr. The actual value of the q-th digit is then lost and it will never be checked in the loop:
lastRemovedDigit gets initialized to 0, which means that in the first iteration,
vrIsTrailingZeroswill be unchanged andvr % 10is assigned tolastRemovedDigitand- Remove a single decimal digit from
vr,vm,vp.
Now lastRemovedDigit has the correct value, but vrIsTrailingZeros basically ignores the value of the q-th removed digit. It might therefore be invalid in all subsequent iterations and particularly at [round...], where we need the correct values to properly round the result.
(Step 1. above maybe might be interpreted as switching from vrIsTrailingZeros to vrPrevIsTrailingZeros.)
If, however, the code would use q instead of q - 1 --- as the e2 >= 0 branch does --- the q-th digit would not be ignored and vrIsTrailingZeros would have the correct value after the first loop iteration.
I hope this makes some sense now...
Ah, got you. The tests all pass when I replace q - 1 by q in this case, so clearly we don't have cases that trigger any misbehavior.
I suspect that the last digits of the decimal representation can never be 5x0000 for x != 0. I remember looking at the last few digits for a few cases, and being surprised that there were only specific patterns possible.
On the other hand, it may also be an actual bug, but if so, it must be extremely rare. There were some people who ran a few billion tests against another implementation and did not find any differences. (I only ran a few million...)
I suspect that the last digits of the decimal representation can never be 5x0000 for x != 0. I remember looking at the last few digits for a few cases, and being surprised that there were only specific patterns possible.
Ah yes, this seems to be the case here and the code seems to be working:
In the e2 < 0 branch, vrIsTrailingZeros = (mv % 2^(q-1) == 0) can only be a false positive. For this to be true, we need
mv = N * 2^q + 2^(q-1) = 2^(q-1) * (2N + 1).
and since mv = 4 * m2, we need q >= 3. It then follows that
vr = floor( mv * 5^-e2 / 10^q ) = floor( (2N + 1) 5^-(e2 + q) / 2 ).
Now every 5^k, k >= 2, can be written as 5^k = 100 x + 25, from which it follows that the product of an odd number and a power of 5 can be written as either 100 x + 25 or 100 x + 75. This means
vr = floor((100 x + 25) / 2) = 50 x + 12 or
vr = floor((100 x + 75) / 2) = 50 x + 37
i.e., the first removed digit is either 2 or 7 and, in the rounding step, either lastRemovedDigit != 5 (in which case vrIsTrailingZeros is unused) or vrIsTrailingZeros = false.
I'm not sure if this is a complete proof, but it seems that the code is working.
Wow! Nice work! That would seem to imply that we can remove that part, IIUC?
It means that the code is working as is for mantissas of the form (2N+1) 2^(q-1). I'm not sure which part you are referring to here.
I still think we should use q instead of q-1 in the e2 < 0 branch. At least for consistency. And it would allow us to remove the (q <= 1) special case.
BTW, I'd like to know what your "different formulation of the check for exactly 0.5-ness" looks like. Ryu is so great because its so simple and so fast. Would be nice to make it even simpler :-)
-
Should commit https://github.com/ulfjack/ryu/commit/159ce150fa94e0229f3e02ecc6f4243f21388c0b have updated the comment immediately above, https://github.com/ulfjack/ryu/blob/159ce150fa94e0229f3e02ecc6f4243f21388c0b/ryu/d2s.c#L309-L315 which mentions
q - 1repeatedly? -
Should f2s.c's analogous code, https://github.com/ulfjack/ryu/blob/159ce150fa94e0229f3e02ecc6f4243f21388c0b/ryu/f2s.c#L231-L232 be changed accordingly? (Apologies if this question has been answered above.)
-
Yes
-
This seems to be somewhat more complicated. I think it should use
qin the case where we don't explicitly computelastRemovedDigit(in this case the code works exactly like the double-precision implementation) and otherwise it should useq-1. And the code in thee2 >= 0branch should use these values, too. (My initial comment above was incorrect.) - However, for all possiblefloats, the current implementation generates the same output as the double-conversion library.
@abolz I bet that using of q will reduce probability of taking the slower branch with 2 cycles of rounding.