[PyTorch] Re-enable bias+GELU fusion for non-reentrant checkpointing -- WIP

Open denera opened this issue 1 year ago • 1 comments

TorchDynamo has known limitations for autograd.Function implementations and autograd.graph hooks. Activation recompute utilizes both of those mechanisms, so this PR disables TorchDynamo on te.distributed.checkpoint() via the @no_torch_dynamo() decorator.

Feb 27 '24 16:02 denera

@ksivaman Did we implement/merge lazy init for TE/PyTorch yet? If so, I can rebase, test and merge this to re-enable the fusion with checkpointing.

Apr 16 '24 18:04 denera