sleef Exp and pow return infinity too early

Describe the bug The exp function (for SIMD and pure C) returns infinity for all values past a certain threshold, but this threshold is slightly lower than it should be. Looking at the code for double precision, infinity is returned for all x > 709.78271114955742909217217426. However, there are values past this limit whose exponential can still be represented with a finite double. All standard libraries I've tested use the following, slightly larger, threshold instead x > 709.782712893384 (0x1.62e42fefa39efp+9 in binary). This doesn't appear to be an issue for the single precision implementation.

This same threshold is also used in the pow function, so the bug is present there as well.

Command lines, logs and environment All the build steps, environment and configuration are the same as in my other issue, #570, though they shouldn't matter in this case.

To Reproduce

int main()
{
	double x = 709.782712;
	std::cout << "libc:  " << exp(x) << '\n';
	std::cout << "Sleef: " << Sleef_exp_u10(x) << '\n';
}

This produces the following output on my machine

libc:  1.79769e+308
Sleef: inf

Fixing this issue might be as simple as just changing the constant in the implementation, provided the series is still accurate past this point.

Oct 31 '24 02:10 MattyMuir

Hello @MattyMuir, thanks for reporting this!

We will try to adjust the threshold so it is a little closer to general expectation. I also think that constants should be represented using hexfloat, they currently use too many digits, which I don't think is necessary.

Please consider the 2 following notes just FYI

Note 1: I take back my original note, the ieee754 standard does mandate overflow for exp and pow (table 9.1) in the way defined in section 7.4:

The overflow exception shall be signaled if and only if the destination format’s largest finite number is exceeded in magnitude by what would have been the rounded floating-point result

Note 2: Other math libraries like Arm Optimized routines have a more elaborate approach, it evacuates the early overflow to a separate branch where it is fixed, see https://github.com/ARM-software/optimized-routines/blob/master/math/aarch64/sv_exp_1u5.c.

Oct 31 '24 09:10 blapie

Just edited my notes above, as the first one was mistaken.

Oct 31 '24 10:10 blapie

Just had a look at some of the single precision functions. Looks like similar bugs are present in log1pf, sinhf, and coshf. I think the tester needs to be updated to catch these, especially if overflow is mandated by the IEEE standard.

Nov 16 '24 21:11 MattyMuir

Hello @MattyMuir, Thank you for reporting these too. It does seem to impact several routines indeed. I'll start looking into fixing implementation and updating tests shortly.

Nov 18 '24 09:11 blapie

I'll soon upload a PR to fix this and add some tests. I just realised #523 reported the same issue, and confirms that the ulp error is still 1.0 in the interval between the old and new thresholds.

Nov 20 '24 17:11 blapie

Similarly the log1pf issue was pointed out in #473.

Nov 21 '24 09:11 blapie

Hello @shibatch, We are looking at a range of functions for which SLEEF implements an early overflow bound (as in earlier than what the standard recommends). I suspect these bounds were set to maintain accuracy, could you please confirm that? If so we need to be careful if we are going to update them. Were there any other reasons driving these choices of bounds?

Luckily it turns out that exp still provides 1ULP when threshold is changed to one meeting the standard, but it is notoriously hard to assess accuracy for bivariate functions like pow or functions for which the number of extra finite point to test is too large to be computed exhaustively (e.g. log1p).

I will run tests as thoroughly as I can, but I would appreciate your insight on this.

If you have specific comments on exp or pow please comment on #523, and on log1p(f) please comment on #473.

Nov 21 '24 15:11 blapie

Hello, I don't see any problem in changing the overflow threshold. I am aware of pretty many minor accuracy problems, and I have been fixing them bit by bit. On the other hand, when I heard requests from users, they were mainly talking about how to make the computation faster rather than about those minor accuracy issues, so fixing them was a low priority.

Nov 21 '24 15:11 shibatch

Ok, good to know. I think it is fair, they are relatively minor issues. Thanks!

Nov 21 '24 16:11 blapie

I've just created a PR to fix the threshold. As indicated in #523 we can confidently say that the change does not affect overall accuracy in exp.

I have checked pow accuracy with our in-house ulp error assessment program, and here is what I get

...
Largest error so far is 0.50511 at [0x1.7fed001e5f0edp-1, 0x1.1b5ce4d1fb0aep+11] ([0x3fe7fed001e5f0ed, 0x40a1b5ce4d1fb0ae])
  Got  0x1.6e9d97108d4e2p-942
  Want 0x1.6e9d97108d4e1p-942
Largest error so far is 0.57493 at [0x1.7f136e35a1af6p-1, 0x1.2c2f3c91cf6c5p+11] ([0x3fe7f136e35a1af6, 0x40a2c2f3c91cf6c5])
  Got  0x1.ee75e5d42b4bp-1006
  Want 0x1.ee75e5d42b4afp-1006
Largest error so far is 0.59458 at [0x1.7e7a67798b72dp-1, 0x1.e0157ee6672fbp+10] ([0x3fe7e7a67798b72d, 0x409e0157ee6672fb])
  Got  0x1.fb663ae3f3a63p-809
  Want 0x1.fb663ae3f3a62p-809
Largest error so far is 0.65361 at [0x1.7f5c8e80a3cf7p-1, 0x1.235db085e49b7p+11] ([0x3fe7f5c8e80a3cf7, 0x40a235db085e49b7])
  Got  0x1.f992991fa3f97p-974
  Want 0x1.f992991fa3f96p-974
Largest error so far is 0.74696 at [0x1.7ff1b57d71188p-1, 0x1.2e8d51b04ab8p+11] ([0x3fe7ff1b57d71188, 0x40a2e8d51b04ab80])
  Got  0x1.e75a8477de9e7p-1006
  Want 0x1.e75a8477de9e6p-1006
Largest error so far is 0.74696 at [0x1.7ff1b57d71188p-1, 0x1.2e8d51b04ab8p+11] ([0x3fe7ff1b57d71188, 0x40a2e8d51b04ab80])
  Got  0x1.e75a8477de9e7p-1006
  Want 0x1.e75a8477de9e6p-1006

@shibatch Can you please confirm wether the 1.0ULP threshold include the extra 0.5ULP offset introduced by the rounding of the input value (worst case in round to nearest)?

If so then, our result indicates that there are a few points above 1ULP.
If it does not, then the effective accuracy is 1.5ULP, so our result simply confirms that with high confidence pow is under 1.0ULP(+0.5ULP) error.

These max are independent of the change in oflow threshold, since |ylogx| < 709, for instance

x = 7.498910e-01
y = 2.420416e+03
ylogx = -6.966623e+02 // |ylogx| < 709
ret = 2.776058e-303 (0x1.e75a8477de9e7p-1006) versus exact return value = 0x1.e75a8477de9e6p-1006

Note that it is quite difficult to isolate cases for which ylog(x) > thres.

Nov 28 '24 16:11 blapie

I have confirmed that the error exceeds 1ulp when arguments are (0x1.7fed001e5f0edp-1,0x1.1b5ce4d1fb0aep+11).

pow : arg0 = 0x1.7fed001e5f0edp-1 (0.749855), arg1 = 0x1.1b5ce4d1fb0aep+11 (2266.9), ulp = 1.00511, t = 3.852255144842898e-284, c = 3.85226e-284

Nov 29 '24 05:11 shibatch

Thanks for confirming. Would you be able to confirm these 2 affirmations?

So the u10 routines are effectively 1.5ULP (u35 are effectively 4ULP) when considering worst-case rounding of the input?
Do you also confirm the maximum reported error of 0.74696 ULP (+0.5ULP)?

Largest error so far is 0.74696 at [0x1.7ff1b57d71188p-1, 0x1.2e8d51b04ab8p+11] ([0x3fe7ff1b57d71188, 0x40a2e8d51b04ab80])
  Got  0x1.e75a8477de9e7p-1006
  Want 0x1.e75a8477de9e6p-1006

Nov 29 '24 14:11 blapie

Rounding of inputs is not taken into account in calculating accuracy. If we are going to take that into account, what happens when we give a very large argument for sin?

Nov 29 '24 14:11 shibatch

I have confirmed that the error exceeds 1ulp when arguments are (0x1.7fed001e5f0edp-1,0x1.1b5ce4d1fb0aep+11).

pow : arg0 = 0x1.7fed001e5f0edp-1 (0.749855), arg1 = 0x1.1b5ce4d1fb0aep+11 (2266.9), ulp = 1.00511, t = 3.852255144842898e-284, c = 3.85226e-284

Testing with these same arguments in the latest version also exceeds 1ULP error. It looks like this inaccuracy is unrelated to the new PR.

Nov 29 '24 15:11 MattyMuir