GaLore issues

Memory issue

Hi, thanks for releasing GaLore! I'm running out of memory whenever I use a sequence length longer than 512, even if I use a smaller model. I can train a...

fakerybakery

IndexError: tuple index out of range

Hi Jiawei, I was trying Galore on TinyLlama-1B using the codebase https://github.com/jzhang38/TinyLlama on 4* A800-80GB. I encounter the following error: ``` [rank1]: optimizer.step() [rank1]: File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 74, in step...

zyushun

Hyperparameters for SFT?

4

Thanks for the great work. One thing I'm curious about is that does it actually work well on SFT for LLMs? It is not covered in the paper, as well....

peterjc123

Extend GaLore Algorithm for General Tensor Decomposition

The GaLore algorithm was originally designed to perform lower-order gradient approximation for matrices using Singular Value Decomposition (SVD). This pull request extends the algorithm to support general tensor decomposition, allowing...

Robertboy18

When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01

1

``` trainer = ORPOTrainer( model=model, train_dataset=dataset["train"], eval_dataset=dataset["test"], #peft_config=peft_config, tokenizer=tokenizer, args= ORPOConfig( max_length=cutoff_len, max_prompt_length=cutoff_len//2, beta=0.1, per_device_train_batch_size=micro_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, warmup_steps=0, num_train_epochs=num_epochs, lr_scheduler_type="cosine", learning_rate=8e-6, bf16=True, logging_steps=10, optim = "galore_adamw_8bit_layerwise", optim_target_modules=[r".*attn.*", r".*mlp.*"], optim_args="rank=1024, update_proj_gap=500, scale=0.25",...

Minami-su

`torch_run.py` lacking autocast and scaling for Automatic Mixed Precision

1

Hey, As mentioned in the title, there is the direct conversion of the model to BF16, without the use of `torch.amp` functions of `autocast` and `scaling` needed for AMP. This...

bhavnicksm

Galore unstable on Llama 7B beyond 20K steps

1

![W B Chart 5_2_2024, 3_02_11 PM](https://github.com/jiaweizzhao/GaLore/assets/21994498/5e4788b1-bf60-415d-9b60-f1daa64db9b4) To replicate the above results, run cmd in README, machine configuration: A100 80GB, CUDA version: 11.8, other environments are installed following the recommendation in...

kyleliang919

Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

2

Thank you for your great work. I am trying to reproduce the results in "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks" I have go through [this issue](https://github.com/jiaweizzhao/GaLore/issues/25), but I still...

JamesSand

Questions about Figure 3 in the original paper

![1](https://github.com/jiaweizzhao/GaLore/assets/145021666/711fec1b-48c1-4d9a-a02e-465865ba13a3) In the figure, **Rank = 1024** and **Rank = 512** is very close to the **baseline**, even better than the **baseline**. In response, I have the following 2 questions....

fy817

ValueError: some parameters appear in more than one parameter group

I encountered an error, how should I resolve it? [WARNING|trainer.py:1272] 2024-04-27 12:04:25,428 >> Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before...

jiaohuix

GaLore
GaLore copied to clipboard

Metadata

Memory issue

IndexError: tuple index out of range

Hyperparameters for SFT?

Extend GaLore Algorithm for General Tensor Decomposition

When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01

`torch_run.py` lacking autocast and scaling for Automatic Mixed Precision

Galore unstable on Llama 7B beyond 20K steps

Questions about reproducing the result of "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks"

Questions about Figure 3 in the original paper

ValueError: some parameters appear in more than one parameter group

← Metadata

Owner

Metadata

GaLore GaLore copied to clipboard

Metadata

← Metadata

Owner

Metadata

GaLore
GaLore copied to clipboard