GaLore
GaLore copied to clipboard
ValueError: can't optimize a non-leaf Tensor (param.is_leaf=False,param.retains_grad=False)
My model works fine with adamw_bnb_8bit. When i switched to galore_adamw_8bit with 'all-linear', an exception is raised 'can't optimize a non-leaf'
Seq2SeqTrainingArguments(
output_dir = model_name_or_path,
save_strategy = 'no',
logging_steps = 100,
bf16 = True if torch.cuda.is_available() else False,
dataloader_pin_memory = True,
dataloader_num_workers = 8,
num_train_epochs = 1, #1, # 2,
do_train=True,
learning_rate = learning_rate, # 5e-5,
# optim = 'adamw_bnb_8bit',
optim="galore_adamw_8bit_layerwise",
optim_target_modules='all-linear',
lr_scheduler_type = 'constant', # 'cosine', constant
warmup_ratio = 0.,
per_device_train_batch_size = batch_size, # 8,
gradient_accumulation_steps = 1,
report_to = 'none',
do_eval=False,
max_steps = max_steps,
accelerator_config = {'dispatch_batches':False},
**kwargs
)