OpenCL-CTS Enabled tgamma test for bruteforce suite

Fixes #507 according to work plan from issue description.

Jan 28 '25 14:01 shajder

Still reviewing/testing this. Thanks.

Aug 25 '25 15:08 lakshmih

We would like to run this test since it is a new builtin being tested but could you rebase to tip first?

Sep 03 '25 17:09 lakshmih

We would like to run this test since it is a new builtin being tested but could you rebase to tip first?

@lakshmih done

Sep 05 '25 10:09 shajder

Hello, we would like more time to review this if possible, thank you.

Sep 08 '25 13:09 AhmedAmraniAkdi

Need more time before this gets merged, still evaluating.

Sep 24 '25 00:09 lakshmih

Our implementation won't pass this test. We'll need some time to put some compiler engineers onto fixing it.

Sep 26 '25 12:09 paulfradgley

Can we update the reference algorithm? It produces wrong output for this result:

Input -> -1.92592994438723585306e-34 Reference value - Wolfram -> -5.19229685853482762853e+33 Reference value - CTS -> -1.03845937170696552571e+34 (Wrong)

CTS uses openlibm's tgammal. We suggest using openlibm's tgamma, which calculates the result correctly.

tgamma -> -5.19229685853482762853e+33 tgammal -> -1.03845937170696552571e+34 (Wrong)

openlibm also defaults to tgamma on systems where long double = double

Oct 14 '25 16:10 lakshmih

Can we update the reference algorithm? It produces wrong output for this result:

Input -> -1.92592994438723585306e-34 Reference value - Wolfram -> -5.19229685853482762853e+33 Reference value - CTS -> -1.03845937170696552571e+34 (Wrong)

@lakshmih hmm, I just verified your case. Are you sure you are using latest commit for this PR ? I get a proper result (aligned with Wolfram output), am I missing something?

Oct 16 '25 10:10 shajder

We’re now able to compute the correct reference values. However, we had a question regarding the required precision for the host reference:

Is 64-bit precision sufficient for this use case, or is 128-bit precision necessary? We’ve noticed that using 128-bit significantly increases execution time, so we’d like to confirm whether that level of precision is truly required.

Oct 29 '25 22:10 lakshmih

We’re now able to compute the correct reference values. However, we had a question regarding the required precision for the host reference:

Is 64-bit precision sufficient for this use case, or is 128-bit precision necessary? We’ve noticed that using 128-bit significantly increases execution time, so we’d like to confirm whether that level of precision is truly required.

My perspective:

The 128-bit version of the reference function was significantly easier to port - 230 vs. 878 lines of code.
The assumption was that the 128-bit version offers higher precision, which seems desirable.

Nov 04 '25 14:11 shajder

We’ve noticed that using 128-bit significantly increases execution time, so we’d like to confirm whether that level of precision is truly required.

@lakshmih as discussed in teleconference here is draft PR for comparison purpose

Nov 05 '25 11:11 shajder