GaLore
GaLore copied to clipboard
When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01
trainer = ORPOTrainer(
model=model,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
#peft_config=peft_config,
tokenizer=tokenizer,
args= ORPOConfig(
max_length=cutoff_len,
max_prompt_length=cutoff_len//2,
beta=0.1,
per_device_train_batch_size=micro_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
warmup_steps=0,
num_train_epochs=num_epochs,
lr_scheduler_type="cosine",
learning_rate=8e-6,
bf16=True,
logging_steps=10,
optim = "galore_adamw_8bit_layerwise",
optim_target_modules=[r".*attn.*", r".*mlp.*"],
optim_args="rank=1024, update_proj_gap=500, scale=0.25",
evaluation_strategy="steps" if val_set_size > 0 else "no",
save_strategy="steps",
eval_steps=100 if val_set_size > 0 else None,
save_steps=100,
output_dir=output_dir,
save_total_limit=2,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={'use_reentrant':True},
load_best_model_at_end=True if val_set_size > 0 else False,
ddp_find_unused_parameters=False if ddp else None,
report_to="wandb" if use_wandb else None,
run_name=wandb_run_name if use_wandb else None,
do_train=True,
remove_unused_columns=False,
)
)
Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before starting. Please be patient !
0%| | 0/495 [00:00<?, ?it/s]Could not estimate the number of tokens of the input, floating-point operations will not be computed
{'loss': 0.3557, 'grad_norm': 0.0, 'learning_rate': 0.001, 'rewards/chosen': -0.015678538009524345, 'rewards/rejected': -0.012379011139273643, 'rewards/accuracies': 0.19999998807907104, 'rewards/margins': -0.003299527335911989, 'logps/rejected': -0.12379010766744614, 'logps/chosen': -0.15678536891937256, 'logits/rejected': 0.7921055555343628, 'logits/chosen': 0.791210412979126, 'nll_loss': 0.2719877064228058, 'log_odds_ratio': -0.8374900817871094, 'log_odds_chosen': -0.25091928243637085, 'epoch': 0.06}
{'loss': 0.2634, 'grad_norm': 0.0, 'learning_rate': 0.001, 'rewards/chosen': -0.012010233476758003, 'rewards/rejected': -0.009977776557207108, 'rewards/accuracies': 0.29999998211860657, 'rewards/margins': -0.0020324576180428267, 'logps/rejected': -0.09977775812149048, 'logps/chosen': -0.12010233104228973, 'logits/rejected': 0.7489851713180542, 'logits/chosen': 0.7482139468193054, 'nll_loss': 0.1832979917526245, 'log_odds_ratio': -0.8010236620903015, 'log_odds_chosen': -0.16869042813777924, 'epoch': 0.12}
{'loss': 0.2482, 'grad_norm': 0.0, 'learning_rate': 0.001, 'rewards/chosen': -0.011346157640218735, 'rewards/rejected': -0.01022450439631939, 'rewards/accuracies': 0.4833333492279053, 'rewards/margins': -0.0011216530110687017, 'logps/rejected': -0.102245032787323, 'logps/chosen': -0.11346157640218735, 'logits/rejected': 0.7105721831321716, 'logits/chosen': 0.7108334898948669, 'nll_loss': 0.17242279648780823, 'log_odds_ratio': -0.7573043704032898, 'log_odds_chosen': -0.08471358567476273, 'epoch': 0.18}
{'loss': 0.2444, 'grad_norm': 0.0, 'learning_rate': 0.001, 'rewards/chosen': -0.012975988909602165, 'rewards/rejected': -0.013058923184871674, 'rewards/accuracies': 0.550000011920929, 'rewards/margins': 8.293241262435913e-05, 'logps/rejected': -0.13058921694755554, 'logps/chosen': -0.12975989282131195, 'logits/rejected': 0.6808757781982422, 'logits/chosen': 0.6832461953163147, 'nll_loss': 0.1756206750869751, 'log_odds_ratio': -0.687309741973877, 'log_odds_chosen': 0.04155167192220688, 'epoch': 0.24}