trl icon indicating copy to clipboard operation
trl copied to clipboard

Error about training gemma with dpo

Open yangjianxin1 opened this issue 1 year ago • 6 comments

transformers==4.38.1, trl==0.7.11, torch==2.0.0

when I train gemma with DPOTrainer, all the metrics are NaN:

"log_history": [
    {
      "epoch": 0.0,
      "grad_norm": NaN,
      "learning_rate": 0.0,
      "logits/chosen": NaN,
      "logits/rejected": NaN,
      "logps/chosen": NaN,
      "logps/rejected": NaN,
      "loss": 9078019.2,
      "rewards/accuracies": 0.0,
      "rewards/chosen": NaN,
      "rewards/margins": NaN,
      "rewards/rejected": NaN,
      "step": 5
    },

but when I train Qwen with the same dpo setting, it's metrics are ok:

"log_history": [
    {
      "epoch": 0.0,
      "grad_norm": 2.8560128211975098,
      "learning_rate": 1e-05,
      "logits/chosen": -2.6137852668762207,
      "logits/rejected": -2.465115785598755,
      "logps/chosen": -428.712158203125,
      "logps/rejected": -387.26708984375,
      "loss": 0.6925,
      "rewards/accuracies": 0.3375000059604645,
      "rewards/chosen": -0.00025393199757672846,
      "rewards/margins": 0.001296185771934688,
      "rewards/rejected": -0.001550117740407586,
      "step": 5
    },

yangjianxin1 avatar Feb 27 '24 10:02 yangjianxin1

I train model with fp16 on V100

yangjianxin1 avatar Feb 27 '24 12:02 yangjianxin1

Hi @yangjianxin1 Hmm, interesting, in what precision are you loading your model?

younesbelkada avatar Feb 28 '24 00:02 younesbelkada

Hi @yangjianxin1 Hmm, interesting, in what precision are you loading your model?

I load model with fp16

yangjianxin1 avatar Feb 28 '24 02:02 yangjianxin1

can you instead try to load the model in fp32 and enable mixed precision training with fp16=True in TrainingArguments?

younesbelkada avatar Feb 28 '24 06:02 younesbelkada

can you instead try to load the model in fp32 and enable mixed precision training with fp16=True in TrainingArguments?

I train model with qlora, Is it because of calculate precision?

yangjianxin1 avatar Mar 01 '24 08:03 yangjianxin1

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Mar 28 '24 15:03 github-actions[bot]