Jiawei Zhao comments

Results 16 comments of


                                            Jiawei Zhao

Please add Phi-2 Support

Can you share a bit more details about the problem you are facing? If you want to try fine-tuning Phi-2 using GaLore, I would suggest you to use https://github.com/hiyouga/LLaMA-Factory, which...

How can i do continued pre-training using this?

For now, you can specify your checkpoint path using `args.continue_from` in torchrun_main.py

How can i do continued pre-training using this?

@jjhoow per-layer GaLore should achieve it out-of-box, as it will assign the optimizer only if param.requires_grad is True, see here: https://github.com/jiaweizzhao/GaLore?tab=readme-ov-file#save-weight-gradient-memory-using-per-layer-weight-updates

(Question) About glue tasks

Hi, thanks for your question. Were you using the hyperparameters and settings provided by our paper (appendix)?

Why not reproject the internal Adam states during update_proj_gap?

Hi, thanks for the suggestion. We didn't include reprojection in the paper but will try to implement it in the repo.

Does galore save gradient memory?

That's correct. LOMO does not directly compress gradient. GaLore should be able to compress gradient to reduce its memory (less memory requirement if we disable LOMO and enable gradient accumulation)....