LibFewShot
LibFewShot copied to clipboard
maml 方法似乎不支持多gpu训练
maml方法能在单个gpu上训练,但在多个gpu上平行训练会报错。具体错误如下:
File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cougarnet.uh.edu/pyuan2/Projects/LibFewShot/core/model/backbone/conv_four.py", line 69, in forward
out1 = self.layer1(x)
File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/cougarnet.uh.edu/pyuan2/anaconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/cougarnet.uh.edu/pyuan2/Projects/LibFewShot/core/model/backbone/utils/maml_module.py", line 63, in forward
if self.weight.fast is not None and self.bias.fast is not None:
AttributeError: 'Tensor' object has no attribute 'fast'
你好,感谢你的反馈,我们正在解决这个问题,会尽快回复。
你好,关于你说的maml方法多gpu的问题,我们发现确实存在这样的问题。并且如果要修改支持多gpu的话,需要对代码进行较大的改动。我们打算在之后进行一次更新,来修复这些比较大的问题。
你好,关于你说的maml方法多gpu的问题,我们发现确实存在这样的问题。并且如果要修改支持多gpu的话,需要对代码进行较大的改动。我们打算在之后进行一次更新,来修复这些比较大的问题。
好的,谢谢!
MAML现在可以多gpu进行训练了。
有一个没有解决的问题是MAML在DistributedDataParallel下不能和SyncBatchNorm同时使用,我们后续会分析缺少同步操作对最终结果的影响,并寻找相应的解决办法。