ryu Incorrect vrIsTrailingZeros?

trafficstars

In d2s.c, should this call https://github.com/ulfjack/ryu/blob/00ecce8c59ab9bdb7b8f6c3dc54ee467cfab6b43/ryu/d2s.c#L315 use q instead of q-1?

As I understand it, the code uses q' = max(0, q-1) to make sure that the loop below is executed at least once, which is required to get a correct value for lastRemovedDigit which in turn is required to correctly round the result. Later, the values of vrIsTrailingZeros and lastRemovedDigit are used to determine if the exact number ends in ...5000...0 here: https://github.com/ulfjack/ryu/blob/00ecce8c59ab9bdb7b8f6c3dc54ee467cfab6b43/ryu/d2s.c#L378

Now, might it be possible that vrIsTrailingZeros is missing one digit, i.e.

                missed digit?
                v
            ...5?000...0
               ^ ~~~~~~^
lastRemovedDigit       vrIsTrailingZeros

I haven't found a counter-example... so it's probably me, who is missing something...

NB: the e2 >= 0 branch is using multipleOfPowerOf5(mv, q).

Related: In f2s.c, https://github.com/ulfjack/ryu/blob/00ecce8c59ab9bdb7b8f6c3dc54ee467cfab6b43/ryu/f2s.c#L195 Shouldn't this use q-1 instead of q? In this case vrIsTrailingZeros includes the value of the last removed digit, but shouldn't it only include the digits after the last removed digit?

lastRemovedDigit
               v
            ...5000...0
               ~~~~~~~^
                      vrIsTrailingZeros

I have replaced the assignment above with

vrIsTrailingZeros = (q == 0) || multipleOfPowerOf5(mv, q - 1);

and tested all single-precision numbers: It doesn't affect the output. So, again, I'm probably missing something here...

NB: the e2 < 0 branch is using multipleOfPowerOf2(mv, q - 1) here.

Jun 01 '19 10:06 abolz

The short answer is no.

We have three values vm, vr, and vp. For vm and vp, the code checks whether all q removed digits were 0, so it needs to check if they are multiples of 10^q. For vr, the code checks whether vr ends in 500...0, where these are the last q digits, so it only checks whether there are q-1 trailing 0s, i.e., whether vr is a multiple of 10^(q-1). Of course, it then also needs to check whether the qth digit is 5, which happens below.

The other branch should also use q - 1, although it's possible that there is a reason why it doesn't, except I can't remember it. If the current code is wrong, then it should be possible to construct a case by reasoning backwards from vr / mv to obtain a bit pattern that would cause a different outcome for this branch.

Jun 01 '19 12:06 ulfjack

We have three values vm, vr, and vp. For vm and vp, the code checks whether all q removed digits were 0, so it needs to check if they are multiples of 10^q. For vr, the code checks whether vr ends in 500...0, where these are the last q digits, so it only checks whether there are q-1 trailing 0s, i.e., whether vr is a multiple of 10^(q-1). Of course, it then also needs to check whether the qth digit is 5, which happens below.

This seems to be the case for the single-precision implementation.

But in the double-precision case, we remove one digit less (q' = q - 1), so that the loop is executed at least once to get a correct value for lastRemovedDigit. So

vr = ? ? ? ? R X r r ... r
vm = ? ? ? ? z z z z ... z
vp = ? ? ? ? w w w w ... w
             ^ ^
             q q'

Here R will be the lastRemovedDigit and we need to know whether [X r r ... r] are all zeros, i.e. whether vr is divisible by 10^(q-1), but the code uses 10^(q'-1) = 10^(q-2) and so, if all the rs are zero, but X is non-zero, vrIsTrailingZeros is wrong?

If the current code is wrong, then it should be possible to construct a case by reasoning backwards from vr / mv to obtain a bit pattern that would cause a different outcome for this branch.

I tried to do exactly that, but didn't get very far...

Jun 01 '19 12:06 abolz

Ah, I should have noted that my q' is actually the same as q in the code. It might be confusing, sorry for that.

I do have a few double-precision test cases which do not correctly round if q-1 is used in the e2 >= 0 branch, i.e.

-- vrIsTrailingZeros = multipleOfPowerOf5(mv, q);
++ vrIsTrailingZeros = (q == 0) || multipleOfPowerOf5(mv, q - 1);

and the test cases:

2.0919495182368195e+19 => 2.0919495182368194e+19
2.6760179287532483e+19 => 2.6760179287532482e+19
3.2942957306323907e+19 => 3.2942957306323906e+19
3.9702293349085635e+19 => 3.9702293349085634e+19
4.0647939013152195e+19 => 4.0647939013152194e+19

Jun 01 '19 13:06 abolz

It doesn't matter what the exact value of q is, we need to use q for vm and vp, and q-1 for vr. This is 'just' the base case, we then incrementally update the flags during the loop as we remove additional digits.

Let's say we remove n digits total. When we start, we don't know what the value of n is; however, we know that n >= q. So we remove q digits, and then compute some flags. If we then remove more digits (n > q), we update the flags in each step. At least, that's what we do in the general case - note that it's rare that the flags are true after removing q digits.

I'm sure there's a reason why we use q rather than q-1 in that case, but I can't say even after looking at the code and my own paper. :-/

Since you're bringing this up, I wonder if the code would be faster if we computed the flags after we know n rather than before. I have since also found a slightly different formulation of the check for exactly 0.5-ness which might lead to slightly simpler code.

Jun 01 '19 13:06 ulfjack

It doesn't matter what the exact value of q is, we need to use q for vm and vp, and q-1 for vr.

and we need to know the value of the last removed digit.

But in the double-precision implementation, lastRemovedDigit is initialized to 0, which might be incorrect.

The correct value of lastRemovedDigit is then computed in the first loop. q' = q - 1 is chosen exactly because we need to have the correct value of lastRemovedDigit and vrIsTrailingZeros to correctly round the result and this choice makes sure that the loop is executed at least once.

So, for

vr = ? ? ? ? R X r r ... r
vm = ? ? ? ? z z z z ... z
vp = ? ? ? ? w w w w ... w
             ^ ^
             q q'

the initial value of lastRemovedDigit is actually wrong if X != 0 (because we only test the last q' - 1 decimal digits) and consequently the value of lastRemovedDigit might be wrong in all subsequent iterations.

That being said, I'm a bit confused right now :-) and will take a look at your reasoning again tomorrow. I'm sure you are correct here.

Jun 01 '19 15:06 abolz

If we know that n > q, and the loop will run at least once, then it's correct if we check q digits for 0, rather than q - 1, otherwise the number cannot end in 500...0, because that's n - 1 >= q zeroes.

Jun 01 '19 16:06 ulfjack

If we know that n > q, and the loop will run at least once, then it's correct if we check q digits for 0, rather than q - 1, otherwise the number cannot end in 500...0, because that's n - 1 >= q zeroes.

I'm not sure what you mean by this.

But let me make one last attempt to explaing my reasoning here.

The current implementation of the double-precision version basically works like this: (The important part here is the initialization of lastRemovedDigit and the validity of the flags before and after the first iteration.)

if (e2 >= 0) {
    vr, vm, vp = (mv, mm, mp) * 2^e2
    q_prime = log10Pow2(e2)
    q = max(0, q_prime - 1)
    vrIsTrailingZeros = multipleOfPowerOf5(mv, q)       // vr % 10^q == 0?
    vmIsTrailingZeros = multipleOfPowerOf5(mm, q)
    vpIsTrailingZeros = multipleOfPowerOf5(mp, q)
} else {
    vr, vm, vp = (mv, mm, mp) * 5^(-e2)
    q_prime = log10Pow5(-e2)
    q = max(0, q_prime - 1)
    vrIsTrailingZeros = multipleOfPowerOf2(mv, q - 1)   // vr % 10^(q - 1) == 0?
    vmIsTrailingZeros = multipleOfPowerOf2(mm, q)
    vpIsTrailingZeros = multipleOfPowerOf2(mp, q)
}

vr /= 10^q // Remove q digits.
vm /= 10^q
vp /= 10^q

vp -= vpIsTrailingZeros

lastRemovedDigit = 0

// This loop is executed at least once if q_prime > 0
while (vm / 10 < vp / 10) {
    vmIsTrailingZeros &= vm % 10 == 0
    vrIsTrailingZeros &= lastRemovedDigit == 0
    lastRemovedDigit = vr % 10
    vr /= 10
    vm /= 10
    vp /= 10
}

[round...]

If the loop is never executed, then at [round...], lastRemovedDigit = 0 and vrIsTrailingZeros = true, which are the correct values.

Otherwise,

In the e2 < 0 branch: we check whether the last q - 1 decimal digits of vr are 0 and then remove q decimal digits from vr. The actual value of the q-th digit is then lost and it will never be checked in the loop:

lastRemovedDigit gets initialized to 0, which means that in the first iteration,

vrIsTrailingZeros will be unchanged and
vr % 10 is assigned to lastRemovedDigit and
Remove a single decimal digit from vr,vm,vp.

Now lastRemovedDigit has the correct value, but vrIsTrailingZeros basically ignores the value of the q-th removed digit. It might therefore be invalid in all subsequent iterations and particularly at [round...], where we need the correct values to properly round the result.

(Step 1. above maybe might be interpreted as switching from vrIsTrailingZeros to vrPrevIsTrailingZeros.)

If, however, the code would use q instead of q - 1 --- as the e2 >= 0 branch does --- the q-th digit would not be ignored and vrIsTrailingZeros would have the correct value after the first loop iteration.

I hope this makes some sense now...

Jun 02 '19 15:06 abolz

Ah, got you. The tests all pass when I replace q - 1 by q in this case, so clearly we don't have cases that trigger any misbehavior.

I suspect that the last digits of the decimal representation can never be 5x0000 for x != 0. I remember looking at the last few digits for a few cases, and being surprised that there were only specific patterns possible.

On the other hand, it may also be an actual bug, but if so, it must be extremely rare. There were some people who ran a few billion tests against another implementation and did not find any differences. (I only ran a few million...)

Jun 02 '19 17:06 ulfjack

I suspect that the last digits of the decimal representation can never be 5x0000 for x != 0. I remember looking at the last few digits for a few cases, and being surprised that there were only specific patterns possible.

Ah yes, this seems to be the case here and the code seems to be working:

In the e2 < 0 branch, vrIsTrailingZeros = (mv % 2^(q-1) == 0) can only be a false positive. For this to be true, we need

mv = N * 2^q + 2^(q-1) = 2^(q-1) * (2N + 1).

and since mv = 4 * m2, we need q >= 3. It then follows that

vr = floor( mv * 5^-e2 / 10^q ) = floor( (2N + 1) 5^-(e2 + q) / 2 ).

Now every 5^k, k >= 2, can be written as 5^k = 100 x + 25, from which it follows that the product of an odd number and a power of 5 can be written as either 100 x + 25 or 100 x + 75. This means

vr = floor((100 x + 25) / 2) = 50 x + 12   or
vr = floor((100 x + 75) / 2) = 50 x + 37

i.e., the first removed digit is either 2 or 7 and, in the rounding step, either lastRemovedDigit != 5 (in which case vrIsTrailingZeros is unused) or vrIsTrailingZeros = false.

I'm not sure if this is a complete proof, but it seems that the code is working.

Jun 03 '19 11:06 abolz

Wow! Nice work! That would seem to imply that we can remove that part, IIUC?

Jun 03 '19 18:06 ulfjack

It means that the code is working as is for mantissas of the form (2N+1) 2^(q-1). I'm not sure which part you are referring to here.

I still think we should use q instead of q-1 in the e2 < 0 branch. At least for consistency. And it would allow us to remove the (q <= 1) special case.

BTW, I'd like to know what your "different formulation of the check for exactly 0.5-ness" looks like. Ryu is so great because its so simple and so fast. Would be nice to make it even simpler :-)

Jun 05 '19 05:06 abolz

Should commit https://github.com/ulfjack/ryu/commit/159ce150fa94e0229f3e02ecc6f4243f21388c0b have updated the comment immediately above, https://github.com/ulfjack/ryu/blob/159ce150fa94e0229f3e02ecc6f4243f21388c0b/ryu/d2s.c#L309-L315 which mentions q - 1 repeatedly?
Should f2s.c's analogous code, https://github.com/ulfjack/ryu/blob/159ce150fa94e0229f3e02ecc6f4243f21388c0b/ryu/f2s.c#L231-L232 be changed accordingly? (Apologies if this question has been answered above.)

Jun 06 '19 00:06 StephanTLavavej

Yes
This seems to be somewhat more complicated. I think it should use q in the case where we don't explicitly compute lastRemovedDigit (in this case the code works exactly like the double-precision implementation) and otherwise it should use q-1. And the code in the e2 >= 0 branch should use these values, too. (My initial comment above was incorrect.) - However, for all possible floats, the current implementation generates the same output as the double-conversion library.

Jun 06 '19 10:06 abolz

@abolz I bet that using of q will reduce probability of taking the slower branch with 2 cycles of rounding.

Jun 06 '19 20:06 plokhotnyuk

ryu ryu copied to clipboard

Incorrect vrIsTrailingZeros?

ryu
ryu copied to clipboard