lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

Move Grad AllReduce Bucketing to inside of `thunder.executors.passes.transform_for_execution` from `torch_autograd.split_forward_backward`

Open crcrpar opened this issue 2 years ago • 1 comments

What does this PR do?

Fixes #184

As per title. The changes are addition of a logic to tell whether or not the input TraceCtx represents DDP backward.

cc @carmocca @awaelchli @crcrpar

crcrpar avatar Apr 18 '24 08:04 crcrpar

The rebase to https://github.com/Lightning-AI/lightning-thunder/pull/222/commits/b855247e171527d4bc523cd4e2ca44b8461460c4 seems to give me a bug which at glance is unintelligible where comms are there even under no_sync. The change of this pr doesn't look that related per se...

crcrpar avatar May 05 '24 06:05 crcrpar

This is the oldest non-draft PR with all green CI. Do we want this as is or do we want to change things yet, @crcrpar @IvanYashchuk ?

t-vi avatar May 30 '24 09:05 t-vi

I don't have any out of my head

crcrpar avatar May 30 '24 16:05 crcrpar

I'm sorry for the delay. I'll review the changes tomorrow.

IvanYashchuk avatar May 30 '24 17:05 IvanYashchuk

Should we try to get this in? @crcrpar @IvanYashchuk

lantiga avatar May 30 '24 18:05 lantiga

@t-vi does this interact with the fsdp work?

lantiga avatar May 30 '24 18:05 lantiga

I think it is orthogonal.

t-vi avatar May 30 '24 18:05 t-vi

This is for ddp. We could reuse this for fsdp backward as well but the semantic would be different from what we have today

crcrpar avatar May 31 '24 01:05 crcrpar

Should we try to get this in? @crcrpar @IvanYashchuk

@lantiga, yes, let's get this in

IvanYashchuk avatar May 31 '24 08:05 IvanYashchuk