AnimeStylized icon indicating copy to clipboard operation
AnimeStylized copied to clipboard

多GPU训练问题

Open maodong2056 opened this issue 5 years ago • 4 comments

File "scripts/whiteboxgan.py", line 186, in training_step vgg_output = self.pretrained(output) File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/home/cyr/17_style/networks/pretrainnet.py", line 172, in forward return self._forward_impl(x) File "/home/cyr/17_style/networks/pretrainnet.py", line 165, in _forward_impl x = self._process(x) File "/home/cyr/17_style/networks/pretrainnet.py", line 156, in _process return self.vgg_normalize(bgr) # vgg norm File "/home/cyr/17_style/networks/pretrainnet.py", line 161, in self.vgg_normalize = lambda x: x - mean RuntimeError: expected device cuda:1 but got device cuda:0

maodong2056 avatar Mar 12 '21 02:03 maodong2056

您好,我使用单GPU训练时能跑起来的,但是使用两个时就不行了

maodong2056 avatar Mar 12 '21 02:03 maodong2056

不好意思哈,我最近有点忙,这个问题是因为我之前只有一个gpu,没有考虑多gpu的情况,应该是将模型复制到两个gpu上就好了,如果您已经修复好了可以pull请求,我可能有空的时候进行修复。

zhen8838 avatar Mar 13 '21 12:03 zhen8838

谢谢答复,我改了trainer中的参数“ distributed_backend: dp”以及gpu数量,以及一些小的bug调整,能调用多gpu但是速度还没单卡快,不知为啥o(╥﹏╥)o

maodong2056 avatar Mar 15 '21 08:03 maodong2056

您好,可以看看具体是怎么改的嘛。我也想改成多gpu,但目前还没有成功

谢谢答复,我改了trainer中的参数“ distributed_backend: dp”以及gpu数量,以及一些小的bug调整,能调用多gpu但是速度还没单卡快,不知为啥o(╥﹏╥)o

boya34 avatar Sep 15 '22 12:09 boya34