Complex asinh accuracy refinement
Update the complex asinh function to avoid numerical issues.
The current complex asinh function loses accuracy in several places. These mostly relate to over/underflow, and catastrophic cancellation for tough values.
This new version fixes these accuracy issues while retaining it's perf.
Perf
On GH100 we don't have much perf difference. (There used to be a much large perf gap until #5371 got merged, which the current version is availing of). Using the math-teams standard math_bench test we have the following:
Operations/SM/cycle:
casinh():
| H100 | old | new | new/old |
|---|---|---|---|
| fp64 | 0.2531 | 0.2549 | 1.01 |
| fp32 | 0.6072 | 0.6334 | 1.04 |
Correctness
The current version has several intervals where accuracy gets lost.
Apart from the usual over/underflow suspects, there is also some very subtle intervals where accuracy gets badly thrown out, especially by catastrophic cancellation very close to +-i.
This new version fixes these and testing gives the following:
GPU Correctness
For the new version, an intensive bracket and bisect search, along with testing special hard values, gives:
GPU fp64:
Max ulp real error (4.867,1.742) @ (0.007757045272,-0.0002247045536) (0x3f7fc5d9fc1f5662,0xbf2d73d56affd72d)
Ours = (0.007756967678,-0.0002246977954) Ref = (0.007756967678,-0.0002246977954)
Ours = (0x3f7fc5c527dd1d58,0xbf2d739b5d7a3961) Ref = (0x3f7fc5c527dd1d53,0xbf2d739b5d7a3963)
Max ulp imag error (0.1719,5.453) @ (7.198570162e+103,-5.623976789e+101) (0x558011effdb5a3ad,0xd510120190e898df)
Ours = (239.8333247,-0.007812471425) Ref = (239.8333247,-0.007812471425)
Ours = (0x406dfaaa988c99ba,0xbf7ffff854599f01) Ref = (0x406dfaaa988c99ba,0xbf7ffff854599efc)
GPU fp32:
Max ulp real error (6.619,2.232) @ (0.007812378462,-0.0007928675623) (0x3bfffefb,0xba4fd871)
Ours = (0.007812298369,-0.0007928435807) Ref = (0.007812301628,-0.0007928434643)
Ours = (0x3bfffe4f,0xba4fd6d5) Ref = (0x3bfffe56,0xba4fd6d3)
Max ulp imag error (3.732,5.528) @ (0.007806597743,3.029832988e-05) (0x3bffce7d,0x37fe292b)
Ours = (0.007806516718,3.029741674e-05) Ref = (0.007806518581,3.029740583e-05)
Ours = (0x3bffcdcf,0x37fe2735) Ref = (0x3bffcdd3,0x37fe272f)
CPU Correctness
CPU fp64:
Max ulp real error (4.125,0) @ (0.01542893159,-0) (0x3f8f993424afec00,0x8000000000000000)
Ours = (0.01542831951,-0) Ref = (0.01542831951,-0)
Ours = (0x3f8f98e1fdb37251,0x8000000000000000) Ref = (0x3f8f98e1fdb3724d,0x8000000000000000)
Max ulp imag error (0.5078,3.484) @ (0.8869326854,-1.12001698e-254) (0x3fec61c0a7b18800,0x8b3505783ad41800)
Ours = (0.7991224705,-8.379245502e-255) Ref = (0.7991224705,-8.379245502e-255)
Ours = (0x3fe99269498d3a37,0x8b2f742347588681) Ref = (0x3fe99269498d3a36,0x8b2f742347588684)
CPU fp32:
Max ulp real error (4.827,0.5125) @ (0.001131535857,0.9893865585) (0x3a94500b,0x3f7d4870)
Ours = (0.007776218932,1.424767017) Ref = (0.007776216604,1.424766898)
Ours = (0x3bfecfa7,0x3fb65ec4) Ref = (0x3bfecfa2,0x3fb65ec3)
Max ulp imag error (0.498,4.695) @ (-5.695841894e+11,3.510264218e+10) (0xd3049ddd,0x5102c47d)
Ours = (-27.76321411,0.06155067682) Ref = (-27.76321411,0.06155069545)
Ours = (0xc1de1b10,0x3d7c1c90) Ref = (0xc1de1b10,0x3d7c1c95)
This pull request requires additional validation before any workflows can run on NVIDIA's runners.
Pull request vetters can view their responsibilities here.
Contributors can view more details about this message here.
😬 CI Workflow Results
🟥 Finished in 2h 13m: Pass: 80%/90 | Total: 1d 06h | Max: 1h 32m | Hits: 94%/152135
See results here.
😬 CI Workflow Results
🟥 Finished in 3h 18m: Pass: 80%/90 | Total: 23h 44m | Max: 1h 21m | Hits: 95%/154506
See results here.
😬 CI Workflow Results
🟥 Finished in 1h 25m: Pass: 78%/90 | Total: 1d 01h | Max: 1h 22m | Hits: 96%/154393
See results here.
/ok to test eb06d29
😬 CI Workflow Results
🟥 Finished in 3h 51m: Pass: 86%/90 | Total: 20h 14m | Max: 1h 33m | Hits: 97%/190753
See results here.
😬 CI Workflow Results
🟥 Finished in 1h 11m: Pass: 98%/90 | Total: 16h 47m | Max: 34m 29s | Hits: 96%/215042
See results here.
/ok to test fa3bd08
😬 CI Workflow Results
🟥 Finished in 56m 56s: Pass: 97%/90 | Total: 19h 03m | Max: 42m 03s | Hits: 97%/210866
See results here.
🥳 CI Workflow Results
🟩 Finished in 1h 37m: Pass: 100%/88 | Total: 21h 47m | Max: 1h 23m | Hits: 98%/213613
See results here.
🥳 CI Workflow Results
🟩 Finished in 2h 21m: Pass: 100%/90 | Total: 22h 39m | Max: 2h 20m | Hits: 97%/213937
See results here.
/ok to test 684c44c
🥳 CI Workflow Results
🟩 Finished in 1h 05m: Pass: 100%/90 | Total: 16h 43m | Max: 55m 05s | Hits: 98%/213937
See results here.
/ok to test eacbe9e
🥳 CI Workflow Results
🟩 Finished in 1h 28m: Pass: 100%/90 | Total: 18h 42m | Max: 39m 48s | Hits: 96%/206358
See results here.
/ok to test 39dbc1c
🥳 CI Workflow Results
🟩 Finished in 1h 25m: Pass: 100%/90 | Total: 1d 13h | Max: 1h 22m | Hits: 95%/202196
See results here.