pytorch-book icon indicating copy to clipboard operation
pytorch-book copied to clipboard

多显卡

Open jilner opened this issue 6 years ago • 1 comments

第五章,运用多显卡时一直报错 File "", line 1, in runfile('/home/skycloud/桌面/mulgpu/mulg2.py', wdir='/home/skycloud/桌面/mulgpu')

File "/home/skycloud/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile execfile(filename, namespace)

File "/home/skycloud/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/skycloud/桌面/mulgpu/mulg2.py", line 127, in gen_fake = generator.forward(z)

File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/nn/6+ el/data_parallel.py", line 122, in forward replicas = self.replicate(self.module, self.device_ids[:len(inputs)])

File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in replicate return replicate(module, device_ids)

File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate param_copies = Broadcast.apply(devices, *params)

File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 19, in forward outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)

File "/home/skycloud/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced return torch._C._broadcast_coalesced(tensors, devices, buffer_size)

RuntimeError: NCCL Error 2: unhandled system error

jilner avatar Jan 12 '19 04:01 jilner

这个可能和环境的配置有关,建议先在单卡的环境下调试通过

chenyuntc avatar Mar 25 '19 20:03 chenyuntc