lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

Torch compile support for distributed operations

Open gluonfield opened this issue 1 year ago • 2 comments

🚀 Feature

Documentation says that torch compile is not supported over distributed training right now. Since torch compile can speed up training as much as 2x using Lightning Trainer is without compile is no longer cost efficient and it would be great to support it.

It's bit unclear to me what happens if I compile the model before passing to Lightning module, will it be used as compiled model over DDP or not?

gluonfield avatar Sep 13 '24 16:09 gluonfield

@AugustDev Thank you, did you want to file this here or with https://github.com/Lightning-AI/pytorch-lightning/issues ?

t-vi avatar Sep 13 '24 18:09 t-vi

The approach we took in Fabric should be transferrable to Trainer as well: https://github.com/Lightning-AI/pytorch-lightning/pull/19280 https://github.com/Lightning-AI/pytorch-lightning/pull/19382 Essentially, it is just ensuring that torch.compile is applied over the FSDP/DDP wrapped model.

awaelchli avatar Sep 16 '24 14:09 awaelchli