flair icon indicating copy to clipboard operation
flair copied to clipboard

Scale Model Training with LightningLite

Open aniketmaurya opened this issue 3 years ago • 4 comments

Is your feature/enhancement request related to a problem? Please describe. Hi, love this project 💜! Trainer can be enhanced with support for multiple hardwares and distributed training when training with large dataset or model.

Describe the solution you'd like We have been developing LightningLite which enables you to leverage all the capabilities of PyTorch Lightning Accelerators without any refactoring to your training loop. You can check out our blog post about it here: Scale your PyTorch code with LightningLite.

Additional context Overall if you are fine with it I am happy to draft a PR with a suggested change to verify in place the impact. Feel free to get in touch and let me know what you think about this. Thanks! 😄

aniketmaurya avatar Apr 01 '22 10:04 aniketmaurya

@aniketmaurya sounds interesting and we'd be happy to check out a draft PR!

There are two trainer classes currently, the LanguageModelTrainer you linked to is to train language models, and the Trainer class is to train all other models (NER, text classification, etc.). Let me know if you have any questions!

alanakbik avatar Apr 01 '22 10:04 alanakbik

Thanks for response @alanakbik, I see both are of type torch.nn.Module so we can scale both of these. I will raise a draft PR soon.

aniketmaurya avatar Apr 01 '22 11:04 aniketmaurya

Very nice! :purple_heart: Would love to see how PL could help here! :rabbit:

Borda avatar Apr 01 '22 12:04 Borda

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 30 '22 19:07 stale[bot]

lets keep it open

aniketmaurya avatar Sep 13 '22 05:09 aniketmaurya