GaLore issues

Does galore save gradient memory?

1

Dear Author, I am truly grateful for your outstanding work. Please allow me to raise a small question regarding the memory of gradient: As I understand it, the LOMO method...

jinqixiao

Why not reproject the internal Adam states during update_proj_gap?

2

Hi, great project. After reading the paper and the implementation, I am wondering if it is considered to reproject the Adam internal states (exp_avg, exp_avg_sq) from previous subspace to the...

liuliu

Galore finetuning #stopped

``` # Configuration parameters model_name_or_path = "mistralai/Mistral-7B-v0.1" max_length = 128 doc_stride = 128 pad_to_max_length = True per_device_train_batch_size = 1 per_device_eval_batch_size = 1 learning_rate = 0.0002 weight_decay = 0.0 num_train_epochs =...

j-datta