LibFewShot icon indicating copy to clipboard operation
LibFewShot copied to clipboard

maml 方法似乎不支持多gpu训练

Open ypy516478793 opened this issue 3 years ago • 3 comments

maml方法能在单个gpu上训练,但在多个gpu上平行训练会报错。具体错误如下:

  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cougarnet.uh.edu/pyuan2/Projects/LibFewShot/core/model/backbone/conv_four.py", line 69, in forward
    out1 = self.layer1(x)
  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
  File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/cougarnet.uh.edu/pyuan2/Projects/LibFewShot/core/model/backbone/utils/maml_module.py", line 63, in forward
    if self.weight.fast is not None and self.bias.fast is not None:
AttributeError: 'Tensor' object has no attribute 'fast'

ypy516478793 avatar Oct 08 '21 15:10 ypy516478793

你好,感谢你的反馈,我们正在解决这个问题,会尽快回复。

wZuck avatar Oct 10 '21 05:10 wZuck

你好,关于你说的maml方法多gpu的问题,我们发现确实存在这样的问题。并且如果要修改支持多gpu的话,需要对代码进行较大的改动。我们打算在之后进行一次更新,来修复这些比较大的问题。

yangcedrus avatar Oct 11 '21 11:10 yangcedrus

你好,关于你说的maml方法多gpu的问题,我们发现确实存在这样的问题。并且如果要修改支持多gpu的话,需要对代码进行较大的改动。我们打算在之后进行一次更新,来修复这些比较大的问题。

好的,谢谢!

ypy516478793 avatar Oct 11 '21 15:10 ypy516478793

MAML现在可以多gpu进行训练了。

有一个没有解决的问题是MAML在DistributedDataParallel下不能和SyncBatchNorm同时使用,我们后续会分析缺少同步操作对最终结果的影响,并寻找相应的解决办法。

yangcedrus avatar Sep 26 '22 06:09 yangcedrus