trl
trl copied to clipboard
Error about training gemma with dpo
transformers==4.38.1, trl==0.7.11, torch==2.0.0
when I train gemma with DPOTrainer, all the metrics are NaN:
"log_history": [
{
"epoch": 0.0,
"grad_norm": NaN,
"learning_rate": 0.0,
"logits/chosen": NaN,
"logits/rejected": NaN,
"logps/chosen": NaN,
"logps/rejected": NaN,
"loss": 9078019.2,
"rewards/accuracies": 0.0,
"rewards/chosen": NaN,
"rewards/margins": NaN,
"rewards/rejected": NaN,
"step": 5
},
but when I train Qwen with the same dpo setting, it's metrics are ok:
"log_history": [
{
"epoch": 0.0,
"grad_norm": 2.8560128211975098,
"learning_rate": 1e-05,
"logits/chosen": -2.6137852668762207,
"logits/rejected": -2.465115785598755,
"logps/chosen": -428.712158203125,
"logps/rejected": -387.26708984375,
"loss": 0.6925,
"rewards/accuracies": 0.3375000059604645,
"rewards/chosen": -0.00025393199757672846,
"rewards/margins": 0.001296185771934688,
"rewards/rejected": -0.001550117740407586,
"step": 5
},
I train model with fp16 on V100
Hi @yangjianxin1 Hmm, interesting, in what precision are you loading your model?
Hi @yangjianxin1 Hmm, interesting, in what precision are you loading your model?
I load model with fp16
can you instead try to load the model in fp32 and enable mixed precision training with fp16=True in TrainingArguments?
can you instead try to load the model in fp32 and enable mixed precision training with
fp16=TrueinTrainingArguments?
I train model with qlora, Is it because of calculate precision?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.