GaLore icon indicating copy to clipboard operation
GaLore copied to clipboard

Results 46 GaLore issues
Sort by recently updated
recently updated
newest added

Hi, thanks for releasing GaLore! I'm running out of memory whenever I use a sequence length longer than 512, even if I use a smaller model. I can train a...

Hi Jiawei, I was trying Galore on TinyLlama-1B using the codebase https://github.com/jzhang38/TinyLlama on 4* A800-80GB. I encounter the following error: ``` [rank1]: optimizer.step() [rank1]: File "/mntcephfs/lab_data/zhangyushun/anaconda/tinyllama/lib/python3.10/site-packages/lightning/fabric/wrappers.py", line 74, in step...

Thanks for the great work. One thing I'm curious about is that does it actually work well on SFT for LLMs? It is not covered in the paper, as well....

The GaLore algorithm was originally designed to perform lower-order gradient approximation for matrices using Singular Value Decomposition (SVD). This pull request extends the algorithm to support general tensor decomposition, allowing...

``` trainer = ORPOTrainer( model=model, train_dataset=dataset["train"], eval_dataset=dataset["test"], #peft_config=peft_config, tokenizer=tokenizer, args= ORPOConfig( max_length=cutoff_len, max_prompt_length=cutoff_len//2, beta=0.1, per_device_train_batch_size=micro_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, warmup_steps=0, num_train_epochs=num_epochs, lr_scheduler_type="cosine", learning_rate=8e-6, bf16=True, logging_steps=10, optim = "galore_adamw_8bit_layerwise", optim_target_modules=[r".*attn.*", r".*mlp.*"], optim_args="rank=1024, update_proj_gap=500, scale=0.25",...

Hey, As mentioned in the title, there is the direct conversion of the model to BF16, without the use of `torch.amp` functions of `autocast` and `scaling` needed for AMP. This...

![W B Chart 5_2_2024, 3_02_11 PM](https://github.com/jiaweizzhao/GaLore/assets/21994498/5e4788b1-bf60-415d-9b60-f1daa64db9b4) To replicate the above results, run cmd in README, machine configuration: A100 80GB, CUDA version: 11.8, other environments are installed following the recommendation in...

Thank you for your great work. I am trying to reproduce the results in "Benchmark 2: Fine-Tuning RoBERTa on GLUE tasks" I have go through [this issue](https://github.com/jiaweizzhao/GaLore/issues/25), but I still...

![1](https://github.com/jiaweizzhao/GaLore/assets/145021666/711fec1b-48c1-4d9a-a02e-465865ba13a3) In the figure, **Rank = 1024** and **Rank = 512** is very close to the **baseline**, even better than the **baseline**. In response, I have the following 2 questions....

I encountered an error, how should I resolve it? [WARNING|trainer.py:1272] 2024-04-27 12:04:25,428 >> Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before...