SA-SSD
SA-SSD copied to clipboard
multi-GPU trainning error
I use multi-GPU trainning,but errors occurs:
Traceback (most recent call last):
File "./train.py", line 131, in <module>
main()
File "./train.py", line 82, in main
model = MMDistributedDataParallel(model.cuda(),find_unused_parameters=True)
File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 305, in __init__
self._ddp_init_helper()
File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 323, in _ddp_init_helper
self._module_copies = replicate(self.module, self.device_ids, detach=True)
File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 88, in replicate
param_copies = _broadcast_coalesced_reshape(params, devices, detach)
File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/nn/parallel/replicate.py", line 67, in _broadcast_coalesced_reshape
return comm.broadcast_coalesced(tensors, devices)
File "/home/ubuntu-502/xu/CIA-SSD/envs/CIA-SSD/lib/python3.6/site-packages/torch/cuda/comm.py", line 39, in broadcast_coalesced
return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: all tensors must be on devices[0]
Any one meet this question or can help me to check this errors? Thank you very much~~~