lion-pytorch icon indicating copy to clipboard operation
lion-pytorch copied to clipboard

Always getting NaNs in long training

Open danbochman opened this issue 1 year ago • 5 comments

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:

  • Models of different sizes 0.2B, 0.7B and 1B params.
  • Betas such as beta1 0.95 and beta2 0.98
  • Learning rates 1e-4, 3e-5 and 1e-5.
  • Triton kernel turned both True and False.

Training was indeed fast but unfortunately in the end always ended up yielding NaNs.

I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).

image

danbochman avatar Dec 05 '23 11:12 danbochman

I have same problem :(

ysesst93013 avatar Jan 03 '24 06:01 ysesst93013

same NaN issue with CosineAnnealing scheduler after the first epoch.

SergeySakharovskiy avatar Jan 28 '24 00:01 SergeySakharovskiy

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:

  • Models of different sizes 0.2B, 0.7B and 1B params.
  • Betas such as beta1 0.95 and beta2 0.98
  • Learning rates 1e-4, 3e-5 and 1e-5.
  • Triton kernel turned both True and False.

Training was indeed fast but unfortunately in the end always ended up yielding NaNs.

I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).

image

May I know the learning rate schedule you are using?

xiangning-chen avatar Jan 29 '24 22:01 xiangning-chen

same issue, i set a big weight decay to avoid it. i suppose that 'update=symbol * lr' enlarging abs(parameter) while symbol not changing.

zjutzyl avatar Mar 12 '24 04:03 zjutzyl

same here. sudden nan losses during 100 e training with onecyclelr and clipping

lindakasabian avatar Apr 16 '24 04:04 lindakasabian