geo-deep-learning Implementing Distributed Data Parallel

Implementing Distributed Data Parallel

Open remtav opened this issue 4 years ago • 0 comments

Currently, GDL uses Data Parallel (DP). According to the pytorch documentation, Distributed Data Parallel (DDP) is much faster the DP.

From pytorch documentation "There are significant caveats to using CUDA models with multiprocessing; unless care is taken to meet the data handling requirements exactly, it is likely that your program will have incorrect or undefined behavior.

It is recommended to use DistributedDataParallel, instead of DataParallel to do multi-GPU training, even if there is only a single node.

The difference between DistributedDataParallel and DataParallel is: DistributedDataParallel uses multiprocessing where a process is created for each GPU, while DataParallel uses multithreading. By using multiprocessing, each GPU has its dedicated process, this avoids the performance overhead caused by GIL of Python interpreter."

more info: https://pytorch.org/docs/master/generated/torch.nn.DataParallel.html

Aug 13 '20 18:08 remtav

geo-deep-learning geo-deep-learning copied to clipboard

Implementing Distributed Data Parallel

geo-deep-learning
geo-deep-learning copied to clipboard