lion-pytorch
lion-pytorch copied to clipboard
Always getting NaNs in long training
I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:
- Models of different sizes 0.2B, 0.7B and 1B params.
- Betas such as
beta1 0.95
andbeta2 0.98
- Learning rates
1e-4
,3e-5
and1e-5
. - Triton kernel turned both
True
andFalse
.
Training was indeed fast but unfortunately in the end always ended up yielding NaNs.
I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).
I have same problem :(
same NaN issue with CosineAnnealing scheduler after the first epoch.
I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations:
- Models of different sizes 0.2B, 0.7B and 1B params.
- Betas such as
beta1 0.95
andbeta2 0.98
- Learning rates
1e-4
,3e-5
and1e-5
.- Triton kernel turned both
True
andFalse
.Training was indeed fast but unfortunately in the end always ended up yielding NaNs.
I think a potential issue could be how LION interacts with a warmup schedule; I am not sure if you're supposed to do warmup with this optimizer or not (which I always did).
May I know the learning rate schedule you are using?
same issue, i set a big weight decay to avoid it. i suppose that 'update=symbol * lr' enlarging abs(parameter) while symbol not changing.
same here. sudden nan losses during 100 e training with onecyclelr and clipping