clspv icon indicating copy to clipboard operation
clspv copied to clipboard

acosh built-in function more accurate with -cl-native-math than without

Open kpet opened this issue 4 years ago • 4 comments

#715 regressed the acosh bruteforce CTS test (-w wimpy, -1 scalar) for me (NVidia GeForce GTX 1050 Ti, Linux driver 460.56). Passing -cl-native-math to clspv fixes the issue.

This could be an issue with libclc or maybe a device-specific issue. Is anybody else getting the same behaviour?

kpet avatar Mar 07 '21 18:03 kpet

Known issue, I ran into some devices that couldn't pass a relaxed version using the glsl extended instruction, but could with the libclc version. This has the unintended side effect of failing full conformance on some devices, but since my goal is getting any device to pass a relaxed conformance, I accepted this regression as tolerable.

The only solution solution I can think of would be to introduce device profiles into clspv. I'm not sure what the best way to expose this would be. Ideally there wouldn't be a value per device and it could be done a vendor level of granularity, but I bet that's overly optimistic.

alan-baker avatar Mar 08 '21 15:03 alan-baker

When implementing the library I tested on swiftshader, an nvidia card, an amd card, a mali device and an adreno device. The mali and adreno failed full conformance with the glsl extended instruction, but the mali passes the relaxed conformance. Only the adreno required the libclc version.

alan-baker avatar Mar 08 '21 16:03 alan-baker

I'm not sure I'm following. Unless I'm missing something, the issue I'm seeing is that libclc is not passing conformance but the glsl extended instruction is. Looking at the implementation of acosh in libclc, the comments suggest the result is only an approximation, what's unclear is whether it's expected to meet the requirements of OpenCL or not.

Also, the CTS doesn't test acosh in relaxed mode. Were you using a patched version of the CTS?

Brainstorming solutions, the first/cheapest that comes to mind would be to allow -cl-native-math to take a list of built-in functions with tables in clvk but given the description for -cl-native-math clspv would still have the freedom to change the implementation in such a way that native could become less accurate than the GLSL instruction (even if it's unlikely to do so). Maybe an explicit way of selecting which built-ins are implemented using GLSL extended instructions?

kpet avatar Mar 08 '21 19:03 kpet

I'l have to dig through my results again because you're right that OpenCL in relaxed mode considers acosh a derived implementation. Not sure when I'll get to that investigation though.

alan-baker avatar Mar 08 '21 19:03 alan-baker

This is now fixed. The issue was with the libclc implementation of fma being buggy. It has been fixed by #1072

rjodinchr avatar Apr 18 '23 12:04 rjodinchr