clspv icon indicating copy to clipboard operation
clspv copied to clipboard

Fix rsqrt

Open rjodinchr opened this issue 3 years ago • 9 comments

Part of #888

I see 2 solutions to solve this test:

  • implement rsqrt = native_divide(1, sqrt) which allows to go back to the previous working implementation
  • implement rsqrt = inverseSqrt which should be fine on every platform as vulkan requirement for it is the same as OpenCL.

rjodinchr avatar Jul 28 '22 09:07 rjodinchr

At the moment, sqrt and rsqrt are coming from libclc. I think we should get rid of them there, as they are just wrappers to llvm builtins, and do not implement any logic.

rjodinchr avatar Jul 28 '22 09:07 rjodinchr

Turns out we already have the mapping for rsqrt = inversesqrt, I tried it out and simply removing rsqrt from clspv's libclc SOURCES resulted in clspv generating a call to the InverseSqrt extended instruction which passed the CTS (at least on the fairly small sample of hardware I can test on). Like you say the inverse sqrt reqs are the same in both specs so that should work similarly running anywhere.

I'm less sure about removing sqrt from our libclc, because it actually makes sure we return NaN if the input is negative:

; Function Attrs: alwaysinline convergent norecurse nounwind readnone
define linkonce_odr dso_local spir_func float @_Z4sqrtf(float %0) local_unnamed_addr #25 {
  %2 = fcmp olt float %0, 0.000000e+00
  %3 = tail call spir_func float @llvm.sqrt.f32(float %0) #48
  %4 = select i1 %2, float 0x7FF8000000000000, float %3
  ret float %4
}

I couldn't find anything in the CL spec that suggested this was a requirement, but the CL cts seems to rely on libm's sqrt for its reference implementation so I think it will expect this behaviour. The result of sqrt(-1) isn't defined in the vulkan spec either, but since the result is explicitly undefined in the glsl spec https://registry.khronos.org/OpenGL-Refpages/gl4/html/sqrt.xhtml I don't find it hard to imagine a platform that could hit this edge case

aarongreig avatar Aug 22 '22 15:08 aarongreig

Keeping sqrt for that looks ok to me. But isn't it the same thing with rsqrt? Shouldn't we have also a check for negative values there?

rjodinchr avatar Aug 22 '22 16:08 rjodinchr

You're right, the ideal solution would be to include that check and then call the glsl InverseSqrt instruction. I'll need to think about how we could do that

aarongreig avatar Aug 22 '22 16:08 aarongreig

We need to consider the fp config flag CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT as well. For this, I don't think starting with a correctly rounded reciprocal or inverse sqrt is sufficient without some extra massaging, so it's probably best for a first pass to associate the flag with extensions exposing native implementations (similar to fma).

Any algorithms relying on divide or sqrt need to be checked as to whether they're assuming correctly rounded implementations.

jlewis-austin avatar Aug 22 '22 17:08 jlewis-austin

Does it mean you are expecting to have vulkan implementations providing correctly rounded reciprocal or inverse sqrt?

rjodinchr avatar Aug 22 '22 17:08 rjodinchr

I'm expecting that some existing Vk implementations do already provide (or at least have access to native functionality to provide) correctly rounded Divide or Square Root implementations. The idea is to provide an extension making this explicit, so that Vk implementations can support it if they do happen to provide that functionality. Then clvk can query the extension in order to claim the same capability for OpenCL via CL_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT.

There's no similar flag in OpenCL for reciprocal or inverse sqrt, so this wouldn't apply to them.

jlewis-austin avatar Aug 22 '22 18:08 jlewis-austin

I see. This is an optimization in clvk. You should open an issue for it. Let's keep this one about fixing rsqrt to bring the accuracy back to what is required by the OpenCL-CTS.

rjodinchr avatar Aug 22 '22 18:08 rjodinchr

with this https://github.com/google/clspv/pull/917 we should be good to just delete sqrt and rsqrt from the builtins library

aarongreig avatar Aug 25 '22 13:08 aarongreig

waiting on https://reviews.llvm.org/D134040 to be merged to close this issue

rjodinchr avatar Oct 27 '22 09:10 rjodinchr

Should be fixed by #1017

rjodinchr avatar Feb 21 '23 08:02 rjodinchr