project-based-learning
project-based-learning copied to clipboard
Add a tutorial for distributed training
Description
I'd love to share a tutorial, from which we can learn how to train our AI model in a distributed manner step by step.
Motivation and Context
As the deep learning model are getting large, we might feel it hard to train it in a non-distributed manner. This tutorial clearly illustrates to me that
- What is distributed training
- How can I use the efficient parallelization techniques to perform and speed up the AI model training
- It also provides some advanced tutorials to help me define my own parallel model
It is quite interesting and helpful for a deep learning researcher.
How Has This Been Tested?
There exists many popular parallelization techniques when it comes to distributed training. But I fail to have a general idea of them. This tutorial gives me a clear view of these advanced parallelization techniques and I can apply all of them on my code after I read this tutorial. I feel I can write the distributed deep learning models just like how I write the model on my laptop. Most importantly, it can greatly save my training time. Hence, I recommend to all of you who need to train deep learning models.
Types of changes
- [ ] Content Update (change which fixes an issue or updates an already existing submission)
- [x] New Article (change which adds functionality)
- [ ] Documentation change
Checklist:
- [x] My code follows the code style of this project.
- [x] I have updated the documentation accordingly.
- [x] I have read the CONTRIBUTING document.
- [x] I have made checks to ensure URLs and other resources are valid