GradNorm issues

Can it be used in DDP?

2

Hi, I use the GardNorm in my segmentation and classification task. I want to use the DistributedDataParallel to train it. But it occurs the error: "RuntimeError: derivative for batch_norm_backward_elemt is...

chengjianhong

Out of memory

Hi author I refer to your code which include GradNorm part, and rewrite for my own transformer based model training. Everything is good, but when the iteration growth up, the...

hongyuntw

this repo is not consistent with the paper

1

![image](https://user-images.githubusercontent.com/24848148/163394875-8f541169-8510-4755-9b14-15d0c03bae5b.png) the l1 and l2 is not with the w

designerZhou

How to avoid w less than 0?

5

Hi, I am a bit confused about the update process of `w`. In the paper, only the sum of `w` is constrained to be `task_num`, but it is not avoided...

guozhiyao

Nonetype

File "main_simmim_pt.py", line 302, in train_one_epoch G1R = torch.autograd.grad(L1, param[0].clone(), retain_graph=True, create_graph=True) File "D:\txj\envs\swin2\lib\site-packages\torch\autograd\__init__.py", line 236, in grad inputs, allow_unused, accumulate_grad=False) RuntimeError: One of the differentiated Tensors appears to not...

xinjiTian

GradNorm
GradNorm copied to clipboard

Metadata

Can it be used in DDP?

Out of memory

this repo is not consistent with the paper

How to avoid w less than 0?

Nonetype

← Metadata

Owner

Metadata

GradNorm GradNorm copied to clipboard

Metadata

Can it be used in DDP?

Out of memory

this repo is not consistent with the paper

How to avoid w less than 0?

Nonetype

← Metadata

Owner

Metadata

GradNorm
GradNorm copied to clipboard