Fix rsqrt
Part of #888
I see 2 solutions to solve this test:
- implement rsqrt = native_divide(1, sqrt) which allows to go back to the previous working implementation
- implement rsqrt = inverseSqrt which should be fine on every platform as vulkan requirement for it is the same as OpenCL.
At the moment, sqrt and rsqrt are coming from libclc. I think we should get rid of them there, as they are just wrappers to llvm builtins, and do not implement any logic.
Turns out we already have the mapping for rsqrt = inversesqrt, I tried it out and simply removing rsqrt from clspv's libclc SOURCES resulted in clspv generating a call to the InverseSqrt extended instruction which passed the CTS (at least on the fairly small sample of hardware I can test on). Like you say the inverse sqrt reqs are the same in both specs so that should work similarly running anywhere.
I'm less sure about removing sqrt from our libclc, because it actually makes sure we return NaN if the input is negative:
; Function Attrs: alwaysinline convergent norecurse nounwind readnone
define linkonce_odr dso_local spir_func float @_Z4sqrtf(float %0) local_unnamed_addr #25 {
%2 = fcmp olt float %0, 0.000000e+00
%3 = tail call spir_func float @llvm.sqrt.f32(float %0) #48
%4 = select i1 %2, float 0x7FF8000000000000, float %3
ret float %4
}
I couldn't find anything in the CL spec that suggested this was a requirement, but the CL cts seems to rely on libm's sqrt for its reference implementation so I think it will expect this behaviour. The result of sqrt(-1) isn't defined in the vulkan spec either, but since the result is explicitly undefined in the glsl spec https://registry.khronos.org/OpenGL-Refpages/gl4/html/sqrt.xhtml I don't find it hard to imagine a platform that could hit this edge case
Keeping sqrt for that looks ok to me.
But isn't it the same thing with rsqrt? Shouldn't we have also a check for negative values there?
You're right, the ideal solution would be to include that check and then call the glsl InverseSqrt instruction. I'll need to think about how we could do that
We need to consider the fp config flag CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT as well. For this, I don't think starting with a correctly rounded reciprocal or inverse sqrt is sufficient without some extra massaging, so it's probably best for a first pass to associate the flag with extensions exposing native implementations (similar to fma).
Any algorithms relying on divide or sqrt need to be checked as to whether they're assuming correctly rounded implementations.
Does it mean you are expecting to have vulkan implementations providing correctly rounded reciprocal or inverse sqrt?
I'm expecting that some existing Vk implementations do already provide (or at least have access to native functionality to provide) correctly rounded Divide or Square Root implementations. The idea is to provide an extension making this explicit, so that Vk implementations can support it if they do happen to provide that functionality. Then clvk can query the extension in order to claim the same capability for OpenCL via CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT.
There's no similar flag in OpenCL for reciprocal or inverse sqrt, so this wouldn't apply to them.
I see.
This is an optimization in clvk. You should open an issue for it.
Let's keep this one about fixing rsqrt to bring the accuracy back to what is required by the OpenCL-CTS.
with this https://github.com/google/clspv/pull/917 we should be good to just delete sqrt and rsqrt from the builtins library
waiting on https://reviews.llvm.org/D134040 to be merged to close this issue
Should be fixed by #1017