Galore-pytorch icon indicating copy to clipboard operation
Galore-pytorch copied to clipboard

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

[Unofficial] GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

WIP Unofficial implementation of GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Roadmap

  • [x] layer-wise training tricks
  • [x] sample training loop
  • [ ] add training logs on toy data
  • [ ] train on real* data

Reference

@article{zhao2024galore,
  title   = {GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection},
  author  = {Jiawei Zhao and Zhenyu Zhang and Beidi Chen and Zhangyang Wang and Anima Anandkumar and Yuandong Tian},
  year    = {2024},
  journal = {arXiv preprint arXiv: 2403.03507}
}