Galore-pytorch
Galore-pytorch copied to clipboard
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
[Unofficial] GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
WIP Unofficial implementation of GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Roadmap
- [x] layer-wise training tricks
- [x] sample training loop
- [ ] add training logs on toy data
- [ ] train on real* data
Reference
@article{zhao2024galore,
title = {GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection},
author = {Jiawei Zhao and Zhenyu Zhang and Beidi Chen and Zhangyang Wang and Anima Anandkumar and Yuandong Tian},
year = {2024},
journal = {arXiv preprint arXiv: 2403.03507}
}