Results 16 comments of Jiawei Zhao

Can you share a bit more details about the problem you are facing? If you want to try fine-tuning Phi-2 using GaLore, I would suggest you to use https://github.com/hiyouga/LLaMA-Factory, which...

For now, you can specify your checkpoint path using `args.continue_from` in torchrun_main.py

@jjhoow per-layer GaLore should achieve it out-of-box, as it will assign the optimizer only if param.requires_grad is True, see here: https://github.com/jiaweizzhao/GaLore?tab=readme-ov-file#save-weight-gradient-memory-using-per-layer-weight-updates

Hi, thanks for your question. Were you using the hyperparameters and settings provided by our paper (appendix)?

Hi, thanks for the suggestion. We didn't include reprojection in the paper but will try to implement it in the repo.

That's correct. LOMO does not directly compress gradient. GaLore should be able to compress gradient to reduce its memory (less memory requirement if we disable LOMO and enable gradient accumulation)....