GaLore
GaLore copied to clipboard
Seems not compatible with DeepSpeed
Hello, thank you very much for such excellent work. We have conducted some experiments using Llama-Factory, and the results indicate that Galore can significantly reduce memory usage during full parameter...
same title
Hi, appreciate to your awesome work! When I trying to introduce GaLore AdamW optimizer to Gemma training, it seems that it is not compatible with deepspeed with Zero stage as...
c4 will soon be deprecated, using allenai/c4 instead
- Add Dockerfile based on CUDA 12.1 image - Add compose config to build and run on GPU easily - Add pyproject.toml with poetry dependencies - Add poetry.lock with locked...
Hi there! Amazing research on this. We're looking to integrate galore into the axolotl project here https://github.com/OpenAccess-AI-Collective/axolotl/pull/1370 One issue I ran into is the transformers dependency pin is a bit...
Would it be possible for you to add how long each training run takes to the README? I think a lot of people who have heard about Galore would be...
Hi, Sorry if this is stupid question but, is it possible to use the 8bit galore optimiser in combination with LoRA adapters? Thanks
Adafactor originally does its own an approximation of second moment. But when GaLore is enabled, that approximation is done based on the shrunken grad by GaLore instead of the raw...