GaLore issues

Seems not compatible with DeepSpeed

Third-party benchmark

15

Hello, thank you very much for such excellent work. We have conducted some experiments using Llama-Factory, and the results indicate that Galore can significantly reduce memory usage during full parameter...

hiyouga

support sft?

3

same title

NickyDark1

Seems not compatible with DeepSpeed (perhaps also FSDP)

5

Hi, appreciate to your awesome work! When I trying to introduce GaLore AdamW optimizer to Gemma training, it seems that it is not compatible with deepspeed with Zero stage as...

SparkJiao

Update torchrun_main.py

c4 will soon be deprecated, using allenai/c4 instead

darthjaja6

chore: Initialize Docker setup

- Add Dockerfile based on CUDA 12.1 image - Add compose config to build and run on GPU easily - Add pyproject.toml with poetry dependencies - Add poetry.lock with locked...

tomas-gajarsky

be a bit more lenient on transformers version

1

Hi there! Amazing research on this. We're looking to integrate galore into the axolotl project here https://github.com/OpenAccess-AI-Collective/axolotl/pull/1370 One issue I ran into is the transformers dependency pin is a bit...

winglian

Training Time

2

Would it be possible for you to add how long each training run takes to the README? I think a lot of people who have heard about Galore would be...

thisisisheanesu

Galore + Lora?

Hi, Sorry if this is stupid question but, is it possible to use the 8bit galore optimiser in combination with LoRA adapters? Thanks

nivibilla

Double approximation of second moment in Adafactor

2

Adafactor originally does its own an approximation of second moment. But when GaLore is enabled, that approximation is done based on the shrunken grad by GaLore instead of the raw...

threewayhandshake

GaLore
GaLore copied to clipboard

Metadata

Seems not compatible with DeepSpeed

Third-party benchmark

support sft?

Seems not compatible with DeepSpeed (perhaps also FSDP)

Update torchrun_main.py

chore: Initialize Docker setup

be a bit more lenient on transformers version

Training Time

Galore + Lora?

Double approximation of second moment in Adafactor

← Metadata

Owner

Metadata

GaLore GaLore copied to clipboard

Metadata

← Metadata

Owner

Metadata

GaLore
GaLore copied to clipboard