geo-deep-learning
geo-deep-learning copied to clipboard
Implementing Distributed Data Parallel
Currently, GDL uses Data Parallel (DP). According to the pytorch documentation, Distributed Data Parallel (DDP) is much faster the DP.
From pytorch documentation "There are significant caveats to using CUDA models with multiprocessing; unless care is taken to meet the data handling requirements exactly, it is likely that your program will have incorrect or undefined behavior.
It is recommended to use DistributedDataParallel, instead of DataParallel to do multi-GPU training, even if there is only a single node.
The difference between DistributedDataParallel and DataParallel is: DistributedDataParallel uses multiprocessing where a process is created for each GPU, while DataParallel uses multithreading. By using multiprocessing, each GPU has its dedicated process, this avoids the performance overhead caused by GIL of Python interpreter."
more info: https://pytorch.org/docs/master/generated/torch.nn.DataParallel.html