GradNorm
GradNorm copied to clipboard
Can it be used in DDP?
Hi, I use the GardNorm in my segmentation and classification task. I want to use the DistributedDataParallel to train it. But it occurs the error: "RuntimeError: derivative for batch_norm_backward_elemt is not implemented". Can you give me some advice?
Lgard.backward()
File "/homeb/jhcheng/anaconda3/envs/py37-torch/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/homeb/jhcheng/anaconda3/envs/py37-torch/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: derivative for batch_norm_backward_elemt is not implemented
Having similar problem. Trying to differentiate the gradient norm with ddp, but got the same error message. It works fine (I think) with single gpu.
It also works with ddp without syncbatch. So I am guessing that this problem is related to syncbatch
I face the same issues!