project-based-learning Add a tutorial for distributed training

Add a tutorial for distributed training

Open Sze-qq opened this issue 2 years ago • 0 comments

Description

I'd love to share a tutorial, from which we can learn how to train our AI model in a distributed manner step by step.

Motivation and Context

As the deep learning model are getting large, we might feel it hard to train it in a non-distributed manner. This tutorial clearly illustrates to me that

What is distributed training
How can I use the efficient parallelization techniques to perform and speed up the AI model training
It also provides some advanced tutorials to help me define my own parallel model

It is quite interesting and helpful for a deep learning researcher.

How Has This Been Tested?

There exists many popular parallelization techniques when it comes to distributed training. But I fail to have a general idea of them. This tutorial gives me a clear view of these advanced parallelization techniques and I can apply all of them on my code after I read this tutorial. I feel I can write the distributed deep learning models just like how I write the model on my laptop. Most importantly, it can greatly save my training time. Hence, I recommend to all of you who need to train deep learning models.

Types of changes

[ ] Content Update (change which fixes an issue or updates an already existing submission)
[x] New Article (change which adds functionality)
[ ] Documentation change

Checklist:

[x] My code follows the code style of this project.
[x] I have updated the documentation accordingly.
[x] I have read the CONTRIBUTING document.
[x] I have made checks to ensure URLs and other resources are valid

May 23 '22 09:05 Sze-qq

project-based-learning project-based-learning copied to clipboard

Add a tutorial for distributed training

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

project-based-learning
project-based-learning copied to clipboard