cccl icon indicating copy to clipboard operation
cccl copied to clipboard

Complex asinh accuracy refinement

Open s-oboyle opened this issue 2 months ago • 9 comments

Update the complex asinh function to avoid numerical issues.

The current complex asinh function loses accuracy in several places. These mostly relate to over/underflow, and catastrophic cancellation for tough values.

This new version fixes these accuracy issues while retaining it's perf.

Perf

On GH100 we don't have much perf difference. (There used to be a much large perf gap until #5371 got merged, which the current version is availing of). Using the math-teams standard math_bench test we have the following:

Operations/SM/cycle: casinh():

H100 old new new/old
fp64 0.2531 0.2549 1.01
fp32 0.6072 0.6334 1.04

Correctness

The current version has several intervals where accuracy gets lost. Apart from the usual over/underflow suspects, there is also some very subtle intervals where accuracy gets badly thrown out, especially by catastrophic cancellation very close to +-i.

This new version fixes these and testing gives the following:

GPU Correctness

For the new version, an intensive bracket and bisect search, along with testing special hard values, gives:

GPU fp64:
Max ulp real error (4.867,1.742) @ (0.007757045272,-0.0002247045536)    (0x3f7fc5d9fc1f5662,0xbf2d73d56affd72d)
        Ours = (0.007756967678,-0.0002246977954)    Ref = (0.007756967678,-0.0002246977954)
        Ours = (0x3f7fc5c527dd1d58,0xbf2d739b5d7a3961)               Ref = (0x3f7fc5c527dd1d53,0xbf2d739b5d7a3963)

Max ulp imag error (0.1719,5.453) @ (7.198570162e+103,-5.623976789e+101)        (0x558011effdb5a3ad,0xd510120190e898df)
        Ours = (239.8333247,-0.007812471425)    Ref = (239.8333247,-0.007812471425)
        Ours = (0x406dfaaa988c99ba,0xbf7ffff854599f01)               Ref = (0x406dfaaa988c99ba,0xbf7ffff854599efc)
GPU fp32:
Max ulp real error (6.619,2.232) @ (0.007812378462,-0.0007928675623)    (0x3bfffefb,0xba4fd871)
        Ours = (0.007812298369,-0.0007928435807)    Ref = (0.007812301628,-0.0007928434643)
        Ours = (0x3bfffe4f,0xba4fd6d5)               Ref = (0x3bfffe56,0xba4fd6d3)

Max ulp imag error (3.732,5.528) @ (0.007806597743,3.029832988e-05)     (0x3bffce7d,0x37fe292b)
        Ours = (0.007806516718,3.029741674e-05)    Ref = (0.007806518581,3.029740583e-05)
        Ours = (0x3bffcdcf,0x37fe2735)               Ref = (0x3bffcdd3,0x37fe272f)

CPU Correctness

CPU fp64:
Max ulp real error (4.125,0) @ (0.01542893159,-0)       (0x3f8f993424afec00,0x8000000000000000)
        Ours = (0.01542831951,-0)    Ref = (0.01542831951,-0)
        Ours = (0x3f8f98e1fdb37251,0x8000000000000000)               Ref = (0x3f8f98e1fdb3724d,0x8000000000000000)

Max ulp imag error (0.5078,3.484) @ (0.8869326854,-1.12001698e-254)     (0x3fec61c0a7b18800,0x8b3505783ad41800)
        Ours = (0.7991224705,-8.379245502e-255)    Ref = (0.7991224705,-8.379245502e-255)
        Ours = (0x3fe99269498d3a37,0x8b2f742347588681)               Ref = (0x3fe99269498d3a36,0x8b2f742347588684)
CPU fp32:
Max ulp real error (4.827,0.5125) @ (0.001131535857,0.9893865585)       (0x3a94500b,0x3f7d4870)
        Ours = (0.007776218932,1.424767017)    Ref = (0.007776216604,1.424766898)
        Ours = (0x3bfecfa7,0x3fb65ec4)               Ref = (0x3bfecfa2,0x3fb65ec3)

Max ulp imag error (0.498,4.695) @ (-5.695841894e+11,3.510264218e+10)   (0xd3049ddd,0x5102c47d)
        Ours = (-27.76321411,0.06155067682)    Ref = (-27.76321411,0.06155069545)
        Ours = (0xc1de1b10,0x3d7c1c90)               Ref = (0xc1de1b10,0x3d7c1c95)

s-oboyle avatar Oct 31 '25 20:10 s-oboyle

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

copy-pr-bot[bot] avatar Oct 31 '25 20:10 copy-pr-bot[bot]

😬 CI Workflow Results

🟥 Finished in 2h 13m: Pass: 80%/90 | Total: 1d 06h | Max: 1h 32m | Hits: 94%/152135

See results here.

github-actions[bot] avatar Nov 03 '25 20:11 github-actions[bot]

😬 CI Workflow Results

🟥 Finished in 3h 18m: Pass: 80%/90 | Total: 23h 44m | Max: 1h 21m | Hits: 95%/154506

See results here.

github-actions[bot] avatar Nov 06 '25 21:11 github-actions[bot]

😬 CI Workflow Results

🟥 Finished in 1h 25m: Pass: 78%/90 | Total: 1d 01h | Max: 1h 22m | Hits: 96%/154393

See results here.

github-actions[bot] avatar Nov 07 '25 16:11 github-actions[bot]

/ok to test eb06d29

miscco avatar Nov 07 '25 17:11 miscco

😬 CI Workflow Results

🟥 Finished in 3h 51m: Pass: 86%/90 | Total: 20h 14m | Max: 1h 33m | Hits: 97%/190753

See results here.

github-actions[bot] avatar Nov 07 '25 21:11 github-actions[bot]

😬 CI Workflow Results

🟥 Finished in 1h 11m: Pass: 98%/90 | Total: 16h 47m | Max: 34m 29s | Hits: 96%/215042

See results here.

github-actions[bot] avatar Nov 10 '25 17:11 github-actions[bot]

/ok to test fa3bd08

s-oboyle avatar Nov 10 '25 23:11 s-oboyle

😬 CI Workflow Results

🟥 Finished in 56m 56s: Pass: 97%/90 | Total: 19h 03m | Max: 42m 03s | Hits: 97%/210866

See results here.

github-actions[bot] avatar Nov 11 '25 00:11 github-actions[bot]

🥳 CI Workflow Results

🟩 Finished in 1h 37m: Pass: 100%/88 | Total: 21h 47m | Max: 1h 23m | Hits: 98%/213613

See results here.

github-actions[bot] avatar Nov 18 '25 13:11 github-actions[bot]

🥳 CI Workflow Results

🟩 Finished in 2h 21m: Pass: 100%/90 | Total: 22h 39m | Max: 2h 20m | Hits: 97%/213937

See results here.

github-actions[bot] avatar Nov 20 '25 15:11 github-actions[bot]

/ok to test 684c44c

s-oboyle avatar Nov 21 '25 10:11 s-oboyle

🥳 CI Workflow Results

🟩 Finished in 1h 05m: Pass: 100%/90 | Total: 16h 43m | Max: 55m 05s | Hits: 98%/213937

See results here.

github-actions[bot] avatar Nov 21 '25 11:11 github-actions[bot]

/ok to test eacbe9e

s-oboyle avatar Nov 26 '25 14:11 s-oboyle

🥳 CI Workflow Results

🟩 Finished in 1h 28m: Pass: 100%/90 | Total: 18h 42m | Max: 39m 48s | Hits: 96%/206358

See results here.

github-actions[bot] avatar Nov 26 '25 15:11 github-actions[bot]

/ok to test 39dbc1c

s-oboyle avatar Dec 02 '25 16:12 s-oboyle

🥳 CI Workflow Results

🟩 Finished in 1h 25m: Pass: 100%/90 | Total: 1d 13h | Max: 1h 22m | Hits: 95%/202196

See results here.

github-actions[bot] avatar Dec 02 '25 18:12 github-actions[bot]