ColossalAI
ColossalAI copied to clipboard
[FEATURE]: Integrate GaLore into Colossalai Optimizer(Gemini/Hybrid)
Describe the feature
A recent paper titled "GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection" (https://arxiv.org/pdf/2403.03507.pdf) demonstrates a remarkable memory-efficient approach during the training of large language models (LLMs).
Can we integrate this memory-efficient technique into the Colossalai framework?
FYI
- GaLore Adamw: https://github.com/jiaweizzhao/GaLore/blob/master/galore_torch/adamw.py
- 8bit-GaLore Adamw: https://github.com/jiaweizzhao/GaLore/blob/master/galore_torch/adamw8bit.py
Any ColossalAI-er could take a look?
Thanks! We will take a look.
I will take multiple looks
I see the MR, that's awesome, when can we use it?
I plan to release it next week