Yan Wang
Yan Wang
> Is there a small repro that we can use to test this PR? Yeah, I add a case to test it
Hi @KaelanDt , could you help review this?
Hi @KaelanDt , I think it's ready to merge, could you help to take a look?
It's just this slow to test it, for example, `test_thunder_specific_reports`, it tests on the function that has dynamo break and thunder split, which results in 4 subgraphs, and for each...
Some update: While saving the repro script to debug this issue, I encountered another accuracy problem on the RTX6000: the Thunder-compiled function outputs `nan`, whereas both eager mode and torch.compile...
On GB200, there's a related issue in NVFuser: https://github.com/NVIDIA/Fuser/issues/5346
>This happens because in the forward we are calling torch.nn.functional.linear but in the backward, we are calling te_functional_linear_backward (without ever calling the TE's forward). https://github.com/Lightning-AI/lightning-thunder/blob/60f3ee1ec536ee8d6fdef503af54525e0a3978a4/thunder/torch/__init__.py#L5319-L5331 checkpointing uses vjp, is the...
https://github.com/Lightning-AI/lightning-thunder/blob/60f3ee1ec536ee8d6fdef503af54525e0a3978a4/thunder/core/transforms.py#L2819 the input trace is: ``` @torch.no_grad() @no_autocast def flat_func(*flat_args): # flat_args: "Collection" t0, = flat_args t1 = ltorch.cos(t0) # t1: "cuda:0 f32[16, 16]" # t1 = prims.cos(t0) # t1:...
Hi @kshitij12345 >My earlier assumption was that try_execute_thunder_symbol only invokes the meta function associated with the Torch symbol (without interacting with the executors). Could the reason for this behavior be...
Hi @IvanYashchuk @kshitij12345 , could you take a look again? I added the repro in the issue #2120