array-api Specify correct rounding for sqrt

In accuracy.rst, correct rounding is required for add, sub, mul, and divide. I propose adding sqrt to this list. Note that sqrt is not listed later among the mathematical functions whose accuracy is not precisely defined.

Note that the IEEE Std 754-2019 for Floating-Point Arithmetic mandates correct rounding for squareRoot.

Jul 28 '24 18:07 hpkfft

Is this something that's actually the case for existing libraries/devices? If so, there shouldn't be an issue with adding it.

By the way, is your proposal for this to apply to both real and complex inputs or just to real inputs?

Jul 29 '24 21:07 asmeurer

Excellent point! My proposal is for this to apply just to real inputs. Given industry adoption of IEEE Std 754-2019, correct rounding is actually the case for existing libraries/devices. Sometimes there is an option for less accuracy (and presumably higher performance). For example, CUDA in single precision requires compiling with -prec-sqrt=true, and division requires -prec-div=true, for correct rounding. Since this document requires correctly rounded division, it equally well ought to require correctly rounded square root. [Double precision CUDA divide and square root are always correctly rounded. See the "Mathematical Functions" section of the "CUDA C++ Programming Guide".]

Jul 29 '24 23:07 hpkfft

I believe the existing accuracy requirements for element-wise arithmetic operations are only intended to apply to real inputs. For example, the usual 6 flop implementation of complex multiplication

(a + bi)(c + di) = (ac - bd) + (ad + bc)i

does not result in correctly rounded real and imaginary components. I do not believe this array standard means to prohibit this implementation, but this should be clarified.

Jul 30 '24 04:07 hpkfft

I do not believe this array standard means to prohibit this implementation

Yes, I think this is correct. As the specification advocates for IEEE 754 compliance (with some caveats; e.g., subnormals) and IEEE 754 only applies to reals, we don't have an equivalent mandate for complex number operations. Accordingly, we should make this distinction explicit.

Re: sqrt. Yes, I think this is a reasonable recommendation. The main point for the arithmetic operations being correctly rounded is to limit error accumulation. As sqrt is a fundamental operation--and more fundamental than the various transcendentals--I agree that requiring correctly rounded behavior is reasonable and should be added.

Sep 19 '24 06:09 kgryte

For those interested in the accuracy of complex operators/functions, my colleagues and I have done a comparison with different compilers/libraries: complex.pdf

Oct 02 '24 02:10 hpkfft

PR: https://github.com/data-apis/array-api/pull/882

Jan 09 '25 11:01 kgryte

By default, the CUDA compiler sets -prec-div=true, -prec-sqrt=true, and -ftz=false. https://docs.nvidia.com/cuda/floating-point/index.html#compiler-flags

The CuPy library is compiled with -ftz=true (overriding the default for this particular flag). Thanks, @leofang, for this info.

Jan 10 '25 00:01 hpkfft

Note that for IEEE square root a subnormal result can never be produced. So, a hardware mode that flushes subnormal results to zero is irrelevant.

Jan 10 '25 23:01 hpkfft