highway icon indicating copy to clipboard operation
highway copied to clipboard

NaN propagation in exponential function

Open nielskm opened this issue 3 years ago • 4 comments

When applying the Exp function to NaN on my CPU (Intel i7-1185G7 using the AVX3_DL target) the function returns 0.0 instead of the NaN that I expected (and that std::exp returns).

Is this a bug or performance-related feature of the highway Exp function?

nielskm avatar Oct 14 '22 12:10 nielskm

Hi @nielskm , we were not focusing on NaN propagation in the design of the math library. This typically comes at a (minor but nonzero) cost.

The relevant line is return IfThenElseZero(Ge(x, kLowerBound), y);. Comparison of x=NaN returns false, so we get zero from IfThenElseZero. I suppose this could be turned into something like: IfThenZeroElse(Lt(x, kLowerBound), IfThenElse(Ge(x, kLowerBound), y, x). The first comparison is to flush tiny inputs to zero, the second keeps the result or replaces with x=NaN. Are NaNs essential to your app?

jan-wassenberg avatar Oct 17 '22 16:10 jan-wassenberg

https://github.com/google/highway/blob/6bc34059186b70f61ccc77fbead83f39627d8157/hwy/contrib/math/math-inl.h#L1124-L1125

The vscalefpd instruction should handle NaNs correctly given AVX3, so correct NaN handling should be free on those targets. It should also make the current implementation of LoadExpShortRange cheaper for those targets https://github.com/google/highway/blob/6bc34059186b70f61ccc77fbead83f39627d8157/hwy/contrib/math/math-inl.h#L829-L832 https://www.felixcloutier.com/x86/VSCALEFPD.html https://uops.info/html-instr/VSCALEFPD_ZMM_ZMM_ZMM.html

EDIT: I'm not sure exactly how it's handled / I'd have to look a lot closer, but can confirm that my implementation in Julia that uses vscalefpd returns NaN/0.0/Inf as appropriate without needing any checks.

chriselrod avatar Oct 17 '22 17:10 chriselrod

Thank you both for your detailed replies! NaN propagation is not essential in my application, but I just happened to test highway Exp against std::exp and the exponential function implemented in Eigen3. Both of the others propagate NaNs, which made me wonder if this was a deliberate design choice in highway.

nielskm avatar Oct 18 '22 07:10 nielskm

@chriselrod good point about using vscale for LoadExpShortRange. We could add an AddExponent op or similar to take advantage of that, if anyone is interested. There's also vfixupimmpd for fixups when no scaling is required, but I'm not sure it's worth the cost on other platforms.

It was indeed a deliberate design choice but that can be changed if/when someone really cares about NaN propagation :)

jan-wassenberg avatar Oct 18 '22 07:10 jan-wassenberg