trl Error about training gemma with dpo

transformers==4.38.1, trl==0.7.11, torch==2.0.0

when I train gemma with DPOTrainer, all the metrics are NaN:

"log_history": [
    {
      "epoch": 0.0,
      "grad_norm": NaN,
      "learning_rate": 0.0,
      "logits/chosen": NaN,
      "logits/rejected": NaN,
      "logps/chosen": NaN,
      "logps/rejected": NaN,
      "loss": 9078019.2,
      "rewards/accuracies": 0.0,
      "rewards/chosen": NaN,
      "rewards/margins": NaN,
      "rewards/rejected": NaN,
      "step": 5
    },

but when I train Qwen with the same dpo setting, it's metrics are ok:

"log_history": [
    {
      "epoch": 0.0,
      "grad_norm": 2.8560128211975098,
      "learning_rate": 1e-05,
      "logits/chosen": -2.6137852668762207,
      "logits/rejected": -2.465115785598755,
      "logps/chosen": -428.712158203125,
      "logps/rejected": -387.26708984375,
      "loss": 0.6925,
      "rewards/accuracies": 0.3375000059604645,
      "rewards/chosen": -0.00025393199757672846,
      "rewards/margins": 0.001296185771934688,
      "rewards/rejected": -0.001550117740407586,
      "step": 5
    },

Feb 27 '24 10:02 yangjianxin1

I train model with fp16 on V100

Feb 27 '24 12:02 yangjianxin1

Hi @yangjianxin1 Hmm, interesting, in what precision are you loading your model?

Feb 28 '24 00:02 younesbelkada

Hi @yangjianxin1 Hmm, interesting, in what precision are you loading your model?

I load model with fp16

Feb 28 '24 02:02 yangjianxin1

can you instead try to load the model in fp32 and enable mixed precision training with fp16=True in TrainingArguments?

Feb 28 '24 06:02 younesbelkada

can you instead try to load the model in fp32 and enable mixed precision training with fp16=True in TrainingArguments?

I train model with qlora, Is it because of calculate precision?

Mar 01 '24 08:03 yangjianxin1

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Mar 28 '24 15:03 github-actions[bot]

trl trl copied to clipboard

Error about training gemma with dpo

trl
trl copied to clipboard