EfficientUnet-PyTorch How to use it with Multi GPU

Thank you for your sharing!!! when I run with single GPU，it runs well, but when I run with multi GPU, it occur error RuntimeError: Function CatBackward returned an invalid gradient at index 1 - expected device cuda:1 but got cuda:0 could you give some advice on this error?

Aug 10 '19 03:08 Hesene

@Hesene Hello Hesene, in my lab I only have one single 2080Ti, therefore I cannot replicate this issue. I'm sorry about it!

Aug 13 '19 15:08 zhoudaxia233

@Hesene Hello Hesene, in my lab I only have one single 2080Ti, therefore I cannot replicate this issue. I'm sorry about it!

Ok, thank you for your code, it help me a lot

Aug 13 '19 15:08 Hesene

I face the same problem. Which part is the cause?

Oct 15 '19 20:10 AtsunoriFujita

did you use torch.nn.DataParallel()?

Jan 17 '20 11:01 goodgoodstudy92

did you use torch.nn.DataParallel()?

no I didn't, but I think it may work

Jan 18 '20 04:01 zhoudaxia233

I face the same problem. Which part is the cause?

I'm not sure, but I think you can try to integrate nn.DataParallel() into the source code

Jan 18 '20 04:01 zhoudaxia233

I face the same problem. Which part is the cause?

I'm not sure, but I think you can try to integrate nn.DataParallel() into the source code

I use efficientnet as backbone to trian a object detection model, and the nn.DataParallel() works fine, the only issue is the speed of multi gpu is quit slow

Jan 18 '20 05:01 goodgoodstudy92

I'm seeing a similar issue when running with nn.DataParallel:

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/efficientunet/efficientunet.py", line 106, in forward
    x = torch.cat([x, blocks.popitem()[1]], dim=1)
RuntimeError: All input tensors must be on the same device. Received cuda:0 and cuda:1

Any ideas?

Thanks!

Apr 25 '20 18:04 ryanstout

I'm seeing a similar issue when running with nn.DataParallel:

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/efficientunet/efficientunet.py", line 106, in forward
    x = torch.cat([x, blocks.popitem()[1]], dim=1)
RuntimeError: All input tensors must be on the same device. Received cuda:0 and cuda:1

Any ideas?

Thanks!

Hi, bro. Are you solved the problem?

Sep 22 '20 02:09 Vipermdl

I suspect that this problem is due to the sharing of a certain module in Efficientunet, which results in this module being only on one GPU, perhaps the encoder……

Nov 08 '20 12:11 If-only1

I suspect that this problem is due to the sharing of a certain module in Efficientunet, which results in this module being only on one GPU, perhaps the encoder……

I agree, I'm now facing the same problem.

Mar 13 '21 07:03 TianyiFranklinWang

@NPU-Franklin Franklin created a PR (#11 ) to support multi GPUs. I do not have multi cards therefore I cannot test it. But maybe you can give it a try.

Apr 20 '21 09:04 zhoudaxia233

EfficientUnet-PyTorch EfficientUnet-PyTorch copied to clipboard

How to use it with Multi GPU

EfficientUnet-PyTorch
EfficientUnet-PyTorch copied to clipboard