Deepfake-Detection icon indicating copy to clipboard operation
Deepfake-Detection copied to clipboard

multiple-gpu case

Open zhuolinumd opened this issue 5 years ago • 3 comments

@HongguLiu have you tested your training code with multiple gpu? I got the RuntimeError: NCCL Error 2: unhandled system error. One gpu case is fine for me. Thanks.

zhuolinumd avatar Jan 24 '20 20:01 zhuolinumd

To train a model with multiple gpus, we use model = nn.DataParallel(model) . If you have trained a model with multiple gpus, you must test model with if isinstance(model, torch.nn.DataParallel): model = model.module

HongguLiu avatar Jan 25 '20 13:01 HongguLiu

@HongguLiu Thanks for letting me know the testing case. I was talking about the training. I got the NCCL error. Have you successfully finished the training with multiple gpu? if so, could you update your python requirements file https://github.com/HongguLiu/Deepfake-Detection/blob/master/requirements.txt to include more details about the python environment ? it could be pytorch issue.

zhuolinumd avatar Jan 25 '20 15:01 zhuolinumd

We usually train our model with multiple gpu. And this code is support of training with multiple gpu.

HongguLiu avatar Feb 03 '20 02:02 HongguLiu