STCN icon indicating copy to clipboard operation
STCN copied to clipboard

About loss being NaN, lr_scheduler.step(), optimizer.step()

Open FlyDre opened this issue 3 years ago • 2 comments
trafficstars

I've read issue #44. Like that case, I change the ResNet50 to another backbone. So I check the link you mentioned: https://discuss.pytorch.org/t/optimizer-step-before-lr-scheduler-step-error-using-gradscaler/92930/7

And therefore i change codes as below: change

But seems losses( all 3 losses) still being NaN and the warning of "UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate" still exsists.

problem

Is this normal or do I need to modify losses.py?

Thank you.

FlyDre avatar Mar 23 '22 08:03 FlyDre

The warning can be ignored. It doesn't matter. I think the problem is in the backbone (I see that you are using ConvNext). I've also tried ConvNext, but

  1. It gives NaN unless I turn off amp (--no_amp)
  2. Even when it is working it is converging much slower than ResNet50 I would love to learn why/if you have a solution.

hkchengrex avatar Mar 23 '22 18:03 hkchengrex

Thanks for replying. I'd update this issue if I figure it out.

FlyDre avatar Mar 24 '22 07:03 FlyDre