GradNorm Can it be used in DDP?

Can it be used in DDP?

Open chengjianhong opened this issue 3 years ago • 2 comments

Hi, I use the GardNorm in my segmentation and classification task. I want to use the DistributedDataParallel to train it. But it occurs the error: "RuntimeError: derivative for batch_norm_backward_elemt is not implemented". Can you give me some advice?

  Lgard.backward()
  File "/homeb/jhcheng/anaconda3/envs/py37-torch/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/homeb/jhcheng/anaconda3/envs/py37-torch/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: derivative for batch_norm_backward_elemt is not implemented

Apr 24 '21 03:04 chengjianhong

Having similar problem. Trying to differentiate the gradient norm with ddp, but got the same error message. It works fine (I think) with single gpu.

It also works with ddp without syncbatch. So I am guessing that this problem is related to syncbatch

Dec 02 '21 09:12 lthilnklover

I face the same issues!

Jul 01 '22 13:07 danieltudosiu

GradNorm GradNorm copied to clipboard

Can it be used in DDP?

GradNorm
GradNorm copied to clipboard