Move Grad AllReduce Bucketing to inside of `thunder.executors.passes.transform_for_execution` from `torch_autograd.split_forward_backward`
What does this PR do?
Fixes #184
As per title. The changes are addition of a logic to tell whether or not the input TraceCtx represents DDP backward.
cc @carmocca @awaelchli @crcrpar
The rebase to https://github.com/Lightning-AI/lightning-thunder/pull/222/commits/b855247e171527d4bc523cd4e2ca44b8461460c4 seems to give me a bug which at glance is unintelligible where comms are there even under no_sync. The change of this pr doesn't look that related per se...
This is the oldest non-draft PR with all green CI. Do we want this as is or do we want to change things yet, @crcrpar @IvanYashchuk ?
I don't have any out of my head
I'm sorry for the delay. I'll review the changes tomorrow.
Should we try to get this in? @crcrpar @IvanYashchuk
@t-vi does this interact with the fsdp work?
I think it is orthogonal.
This is for ddp. We could reuse this for fsdp backward as well but the semantic would be different from what we have today
Should we try to get this in? @crcrpar @IvanYashchuk
@lantiga, yes, let's get this in