Yan Wang comments

Results 78 comments of


                                            Yan Wang

Support CUDA stream operators in ThunderFX

> Is there a small repro that we can use to test this PR? Yeah, I add a case to test it

Support CUDA stream operators in ThunderFX

Hi @KaelanDt , could you help review this?

Support CUDA stream operators in ThunderFX

Hi @KaelanDt , I think it's ready to merge, could you help to take a look?

Some tests in test_dynamo.py are taking more than a minute

It's just this slow to test it, for example, `test_thunder_specific_reports`, it tests on the function that has dynamo break and thunder split, which results in 4 subgraphs, and for each...

`test_networks::test_checkpointing_thunderfx` fails on (G)B200 due to grads mismatch

Some update: While saving the repro script to debug this issue, I encountered another accuracy problem on the RTX6000: the Thunder-compiled function outputs `nan`, whereas both eager mode and torch.compile...

`test_networks::test_checkpointing_thunderfx` fails on (G)B200 due to grads mismatch

On GB200, there's a related issue in NVFuser: https://github.com/NVIDIA/Fuser/issues/5346

ThunderFX fails with FP8 and Activation Checkpointing

>This happens because in the forward we are calling torch.nn.functional.linear but in the backward, we are calling te_functional_linear_backward (without ever calling the TE's forward). https://github.com/Lightning-AI/lightning-thunder/blob/60f3ee1ec536ee8d6fdef503af54525e0a3978a4/thunder/torch/__init__.py#L5319-L5331 checkpointing uses vjp, is the...

ThunderFX fails with FP8 and Activation Checkpointing

https://github.com/Lightning-AI/lightning-thunder/blob/60f3ee1ec536ee8d6fdef503af54525e0a3978a4/thunder/core/transforms.py#L2819 the input trace is: ``` @torch.no_grad() @no_autocast def flat_func(*flat_args): # flat_args: "Collection" t0, = flat_args t1 = ltorch.cos(t0) # t1: "cuda:0 f32[16, 16]" # t1 = prims.cos(t0) # t1:...

Fixes the value_and_grad check in splitter on symbol scaled_dot_product_attention

Hi @kshitij12345 >My earlier assumption was that try_execute_thunder_symbol only invokes the meta function associated with the Torch symbol (without interacting with the executors). Could the reason for this behavior be...

Fixes the value_and_grad check in splitter on symbol scaled_dot_product_attention

Hi @IvanYashchuk @kshitij12345 , could you take a look again? I added the repro in the issue #2120