acosh built-in function more accurate with -cl-native-math than without
#715 regressed the acosh bruteforce CTS test (-w wimpy, -1 scalar) for me (NVidia GeForce GTX 1050 Ti, Linux driver 460.56). Passing -cl-native-math to clspv fixes the issue.
This could be an issue with libclc or maybe a device-specific issue. Is anybody else getting the same behaviour?
Known issue, I ran into some devices that couldn't pass a relaxed version using the glsl extended instruction, but could with the libclc version. This has the unintended side effect of failing full conformance on some devices, but since my goal is getting any device to pass a relaxed conformance, I accepted this regression as tolerable.
The only solution solution I can think of would be to introduce device profiles into clspv. I'm not sure what the best way to expose this would be. Ideally there wouldn't be a value per device and it could be done a vendor level of granularity, but I bet that's overly optimistic.
When implementing the library I tested on swiftshader, an nvidia card, an amd card, a mali device and an adreno device. The mali and adreno failed full conformance with the glsl extended instruction, but the mali passes the relaxed conformance. Only the adreno required the libclc version.
I'm not sure I'm following. Unless I'm missing something, the issue I'm seeing is that libclc is not passing conformance but the glsl extended instruction is. Looking at the implementation of acosh in libclc, the comments suggest the result is only an approximation, what's unclear is whether it's expected to meet the requirements of OpenCL or not.
Also, the CTS doesn't test acosh in relaxed mode. Were you using a patched version of the CTS?
Brainstorming solutions, the first/cheapest that comes to mind would be to allow -cl-native-math to take a list of built-in functions with tables in clvk but given the description for -cl-native-math clspv would still have the freedom to change the implementation in such a way that native could become less accurate than the GLSL instruction (even if it's unlikely to do so). Maybe an explicit way of selecting which built-ins are implemented using GLSL extended instructions?
I'l have to dig through my results again because you're right that OpenCL in relaxed mode considers acosh a derived implementation. Not sure when I'll get to that investigation though.
This is now fixed. The issue was with the libclc implementation of fma being buggy. It has been fixed by #1072