lion-pytorch issues

Same amount of VRAM is taken as in AdamW

6

One of the main benefits of LION, is it needs to save less data for each param. Adam needs to save Momentum and RMSProp ema's, while in LION we need...

VCasecnikovs

Always getting NaNs in long training

5

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations: - Models of different sizes 0.2B, 0.7B and 1B...

danbochman

Using Triton with PyTorch 2.0 for AMP training results in tensors containing inf values.

Hi, thx for your great work! I set `use_triton=True`, and turned on automatic mixed precision training, but `inf` appeared in the results. Does the `lion_pytorch/triton.py` need to consider `bf16` or...

DrRyanHuang

AMD ROCM versions

Pytorch has AMD ROCM builds. How can lion-pytorch use those?

bennmann

Convergence guarantees for Lion

Thank you so much for your great implementations! My collabrators and I have recently onlined a manuscript (available at [https://arxiv.org/abs/2307.10053](https://arxiv.org/abs/2307.10053)) that provides convergence guarantees for Lion optimizer, especially in the...

xnchxy

Do you have the actual weights trained from the paper?

tr1-ai

add an 8-bit version with bitsandbytes

5

https://github.com/TimDettmers/bitsandbytes/blob/main/compile_from_source.md

lucidrains

enhancement

Learning rate scaling for distributed training?

4

Hi @lucidrains, thanks for this implementation. I wonder if you're using distributed training for your [experiments](https://wandb.ai/lucidrains/lion-test/reports/Lion--VmlldzozNTY0OTQ0?accessToken=wxt5ha81c05k26zq01b51j3ondpzsfd1sfmng8x94g16vul5gnxq32zcjdzp5oel). If so, [as noted in Accelerate's docs](https://huggingface.co/docs/accelerate/concept_guides/performance#learning-rates), do you scale your learning rate (on...

RahulBhalley

lion-pytorch
lion-pytorch copied to clipboard

Metadata

Same amount of VRAM is taken as in AdamW

Always getting NaNs in long training

Using Triton with PyTorch 2.0 for AMP training results in tensors containing inf values.

AMD ROCM versions

Convergence guarantees for Lion

Do you have the actual weights trained from the paper?

add an 8-bit version with bitsandbytes

Learning rate scaling for distributed training?

← Metadata

Owner

Metadata

lion-pytorch lion-pytorch copied to clipboard

Metadata

← Metadata

Owner

Metadata

lion-pytorch
lion-pytorch copied to clipboard