array-api icon indicating copy to clipboard operation
array-api copied to clipboard

Specify correct rounding for sqrt

Open hpkfft opened this issue 1 year ago • 5 comments

In accuracy.rst, correct rounding is required for add, sub, mul, and divide. I propose adding sqrt to this list. Note that sqrt is not listed later among the mathematical functions whose accuracy is not precisely defined.

Note that the IEEE Std 754-2019 for Floating-Point Arithmetic mandates correct rounding for squareRoot.

hpkfft avatar Jul 28 '24 18:07 hpkfft

Is this something that's actually the case for existing libraries/devices? If so, there shouldn't be an issue with adding it.

By the way, is your proposal for this to apply to both real and complex inputs or just to real inputs?

asmeurer avatar Jul 29 '24 21:07 asmeurer

Excellent point! My proposal is for this to apply just to real inputs. Given industry adoption of IEEE Std 754-2019, correct rounding is actually the case for existing libraries/devices. Sometimes there is an option for less accuracy (and presumably higher performance). For example, CUDA in single precision requires compiling with -prec-sqrt=true, and division requires -prec-div=true, for correct rounding. Since this document requires correctly rounded division, it equally well ought to require correctly rounded square root. [Double precision CUDA divide and square root are always correctly rounded. See the "Mathematical Functions" section of the "CUDA C++ Programming Guide".]

hpkfft avatar Jul 29 '24 23:07 hpkfft

I believe the existing accuracy requirements for element-wise arithmetic operations are only intended to apply to real inputs. For example, the usual 6 flop implementation of complex multiplication

(a + bi)(c + di) = (ac - bd) + (ad + bc)i

does not result in correctly rounded real and imaginary components. I do not believe this array standard means to prohibit this implementation, but this should be clarified.

hpkfft avatar Jul 30 '24 04:07 hpkfft

I do not believe this array standard means to prohibit this implementation

Yes, I think this is correct. As the specification advocates for IEEE 754 compliance (with some caveats; e.g., subnormals) and IEEE 754 only applies to reals, we don't have an equivalent mandate for complex number operations. Accordingly, we should make this distinction explicit.

Re: sqrt. Yes, I think this is a reasonable recommendation. The main point for the arithmetic operations being correctly rounded is to limit error accumulation. As sqrt is a fundamental operation--and more fundamental than the various transcendentals--I agree that requiring correctly rounded behavior is reasonable and should be added.

kgryte avatar Sep 19 '24 06:09 kgryte

For those interested in the accuracy of complex operators/functions, my colleagues and I have done a comparison with different compilers/libraries: complex.pdf

hpkfft avatar Oct 02 '24 02:10 hpkfft

PR: https://github.com/data-apis/array-api/pull/882

kgryte avatar Jan 09 '25 11:01 kgryte

By default, the CUDA compiler sets -prec-div=true, -prec-sqrt=true, and -ftz=false. https://docs.nvidia.com/cuda/floating-point/index.html#compiler-flags

The CuPy library is compiled with -ftz=true (overriding the default for this particular flag). Thanks, @leofang, for this info.

hpkfft avatar Jan 10 '25 00:01 hpkfft

Note that for IEEE square root a subnormal result can never be produced. So, a hardware mode that flushes subnormal results to zero is irrelevant.

hpkfft avatar Jan 10 '25 23:01 hpkfft