lion-pytorch
lion-pytorch copied to clipboard
Always getting NaNs in long training
I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:
- Models of different sizes 0.2B, 0.7B and 1B params.
- Betas such as
beta1 0.95andbeta2 0.98 - Learning rates
1e-4,3e-5and1e-5. - Triton kernel turned both
TrueandFalse.
Training was indeed fast but unfortunately in the end always ended up yielding NaNs.
I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).
I have same problem :(
same NaN issue with CosineAnnealing scheduler after the first epoch.
I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:
- Models of different sizes 0.2B, 0.7B and 1B params.
- Betas such as
beta1 0.95andbeta2 0.98- Learning rates
1e-4,3e-5and1e-5.- Triton kernel turned both
TrueandFalse.Training was indeed fast but unfortunately in the end always ended up yielding NaNs.
I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).
May I know the learning rate schedule you are using?
same issue, i set a big weight decay to avoid it. i suppose that 'update=symbol * lr' enlarging abs(parameter) while symbol not changing.
same here. sudden nan losses during 100 e training with onecyclelr and clipping
