pytorch-book
pytorch-book copied to clipboard
多显卡
第五章,运用多显卡时一直报错
File "
File "/home/skycloud/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile execfile(filename, namespace)
File "/home/skycloud/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/skycloud/桌面/mulgpu/mulg2.py", line 127, in
File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/nn/6+ el/data_parallel.py", line 122, in forward replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate return replicate(module, device_ids)
File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate param_copies = Broadcast.apply(devices, *params)
File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: NCCL Error 2: unhandled system error
这个可能和环境的配置有关,建议先在单卡的环境下调试通过