design icon indicating copy to clipboard operation
design copied to clipboard

Determinism with non-NaN results?

Open KloudKoder opened this issue 3 years ago • 9 comments

As I read the spec: if neither the inputs to nor the output from a floating-point instruction is NaN, then the output is deterministic (and agrees across all platforms). Clearly, this isn't the case with transcendental functions like sine, but you don't seem to implement those, I think this assertion is correct. Is it? If so, perhaps you could make that clearer in the spec.

KloudKoder avatar Oct 11 '20 02:10 KloudKoder

Yes, it's correct. Wasm's floating-point is deterministic, other than NaN bit patterns.

sunfishcode avatar Oct 11 '20 03:10 sunfishcode

@sunfishcode "and agrees across all platforms"? Remember, this includes weird shit like negative zero and infinity and denormals, none of which are NaN. My guess is that they intended for this to be the case, but no one has actually tested it very well. Hopefully I'm wrong. Hopefully they also didn't make the same mistake as GCC, which on 32-bit platforms had inconsistent floating-point behavior due to accidental mixing of 64-bit and 80-bit precisions.

KloudKoder avatar Oct 11 '20 14:10 KloudKoder

Yes. The spec testsuite has tests for negative zero, infinity, and subnormals, cases which would round differently in 80-bit precision, cases which would round differently on x87's double-precision mode, and more.

sunfishcode avatar Oct 11 '20 14:10 sunfishcode

@sunfishcode OK I'm 90% convinced, so thank you for the detailed feedback. This stuff really matters for scalability and accuracy, but I don't need to discuss all that unless someone wants to know.

I did notice that there's a square root instruction (fsqrt), where it wouldn't be unreasonable to expect cross-platform inconsistencies in some cases, depending upon the algorithm used under the hood to converge on an answer. I have deep suspicions of any instruction that returns rational approximations of irrational numbers.

My concern is not unfounded. In 2014 (see below), it was discovered that Intel's sine (FSIN) function might be accurate to only 4 bits of significance in certain cases. Under such a scenario, the right way to fix it would probably be to leave the existing bad behavior as-is for the sake of consistency, then add another function which actually works.

In theory, square roots are trivially easy to compute, but things could go awry if one is using a Taylor or binomial series which might require much more dynamic range than the old divide-and-average method, or if the circuit designer didn't have a rigorous comprehension of the error term.

For these and other reasons, it would be useful to define fsqrt (WebAssembly) in terms of FSQRT (X86, stating even the CPU model if necessary). That way, if something were to go wrong with some inputs, there would be no spec violation. Instead, it could be fixed with "fsqrt2" or something that doesn't have the problem. In no event does it return a "square root". But this is just documentary pedanticism.

The real concern here is just that I wonder whether non-NaN inputs to fsqrt generate the same result on Intel, AMD, nVidia, ARM, etc. One could sweep the entire 32-bit input space in a short time. That would at least provide substantial confidence that the 64-bit space is also correct. I assume that's been done because it's so obvious, but I don't know, and it's less obvious that any comparison would have been done cross-platform.

I also wonder whether divide-and-average has swept the same space for sake of comparison with fsqrt. If they produce identical results, then all is well. If they produce trivially different results, then the documentary pedanticism above actually matters. If they sometimes produce wildly different results, then we have another manifestation of the FSIN problem (really unlikely, but who knows).

https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion

KloudKoder avatar Oct 12 '20 00:10 KloudKoder

Interesting question. Here's what we know:

  • IEEE 754 implementations of sqrt are required to operate as if they computed the infinitely precise result and then rounded it to the output precision according to the applicable rounding mode.
  • Intel, ARM, and all other popular general-purpose CPU designs have sqrt instructions which claim to conform to IEEE 754. For comparison, FSIN was only ever advertised to be an approximation (which doesn't excuse the error bound being wrong, but does put it in a different light)
  • When hardware designers claim to implement IEEE 754, as a whole they take it very seriously. For example, see the response to the Pentium FDIV bug.
  • sqrt is much simpler to compute than sin, so it's reasonable to expect it can be done correctly.
  • A lot of people depend on sqrt being correct, because a lot of things are built on top of it.
  • In addition to the auto-generated tests covering all the special values (infinity, +/- zero, subnormal, etc.), the spec testsuite also includes several custom tests for sqrt, including all instances of sqrt implementation bugs that a search turned up.
  • As of this writing, there are no reports of differing behavior between platforms.

Sweeping the entire 32-bit input space to compare between hardware platforms, or to compare against divide-and-average are good ideas. I don't know that anyone has done either of these. If anyone does, please post the results somewhere!

For these and other reasons, it would be useful to define fsqrt (WebAssembly) in terms of FSQRT (X86, stating even the CPU model if necessary). That way, if something were to go wrong with some inputs, there would be no spec violation. Instead, it could be fixed with "fsqrt2" or something that doesn't have the problem. In no event does it return a "square root". But this is just documentary pedanticism.

I appreciate the idea here, however it wouldn't be practical for Wasm because, by definition, it'd be impossible to implement this kind of FSQRT on any non-x86 chip. Running on multiple architectures is a big part of the point of WebAssembly, so we wouldn't be able to use such an instruction at all. So the tradeoff is having a usable sqrt and risking some hardware someday having a bug some day, versus of not having a sqrt at all.

sunfishcode avatar Oct 12 '20 01:10 sunfishcode

@sunfishcode Brilliant deep answer. Let me see if I can squeeze in some time to do the divide-and-average comparison...

KloudKoder avatar Oct 14 '20 00:10 KloudKoder

@sunfishcode I just realized that this is a harder question than it seems.

On the old X87 FPU, FSQRT operated at full precision (generally, 80 bits, but programmable to 32 or 64). Assuming 80 bits, then truncating down to 32 or 64 would almost certainly provide the correct result subject to the rounding policy. However, various flavors of AMD64 SIMD, such as AVX, provide their own square root functions (e.g. SQRTPS) which are defined to produce only those lesser precisions (which surely are not taken from 80 bits). So in order to really test this, one would need to compare the imprecise SIMD results to the equivalently-downconverted results from 80-bit FSQRT.

I could hash all those downconverted 32-bit results. I could publish that hash here. But then someone else would need to run the equivalent WASM code on various platforms, then compare the hash.

KloudKoder avatar Oct 14 '20 02:10 KloudKoder

Promoting 32-bit or 64-bit inputs to 80-bit, computing sqrt results rounded to 80 bits, and then rounding those results down to 32 or 64 bits in a separate step suffers from double rounding, so it turns out to be worse than SQRTPS et al which produce 32-bit or 64-bit results that are only rounded once.

sunfishcode avatar Oct 14 '20 13:10 sunfishcode

@sunfishcode I see your point. Then the only solution here is divide-and-average (or square-and-average). I'd be glad to run that test and spit out 2^32 32-bit floats to a file, take its hash, and publish it here. (One could even use a high-precision library to make errors even less likely.) That would make it easy to track down any disagreements between my own results and those of someone running WASM on some other hardware. (One could just binary search for the offset of disagreement by truncating the file at different points.) If you or anyone else is willing to participate, I'll do it.

  • NaNs would need to be excluded. This is easily done, since they're identified by all ones in the exponent field. Ditto negative values.

KloudKoder avatar Oct 14 '20 17:10 KloudKoder

I believe the answers above are as much as we can say at this time. If anyone does any experiments, or if anyone learns of a floating-point implementation bug anywhere, please do post about it, here or in a new issue in this repo.

sunfishcode avatar Oct 28 '22 22:10 sunfishcode