GaLore icon indicating copy to clipboard operation
GaLore copied to clipboard

Results 46 GaLore issues
Sort by recently updated
recently updated
newest added

Dear Author, I am truly grateful for your outstanding work. Please allow me to raise a small question regarding the memory of gradient: As I understand it, the LOMO method...

Hi, great project. After reading the paper and the implementation, I am wondering if it is considered to reproject the Adam internal states (exp_avg, exp_avg_sq) from previous subspace to the...

``` # Configuration parameters model_name_or_path = "mistralai/Mistral-7B-v0.1" max_length = 128 doc_stride = 128 pad_to_max_length = True per_device_train_batch_size = 1 per_device_eval_batch_size = 1 learning_rate = 0.0002 weight_decay = 0.0 num_train_epochs =...

Deal with tensors are distributed in different devices.

Is it possible to release the data you used to draw the loss figure (like Figure 3), or the wandb training files for both baselines and GaLore? I want to...

My model works fine with adamw_bnb_8bit. When i switched to galore_adamw_8bit with 'all-linear', an exception is raised 'can't optimize a non-leaf' ``` Seq2SeqTrainingArguments( output_dir = model_name_or_path, save_strategy = 'no', logging_steps...