Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

per-token KL penalty from the SFT model while doing the PPO training

Open MXuer opened this issue 1 year ago • 0 comments

  • I can't find the part for "per-token KL penalty from the SFT model" during the PPO training in the file model/model_training/trainer_rl.py, maybe I missed something. Could you tell me how these two loss combined?
  • I found the loss function "PolyLoss" in the model/model_training/losses.py. Is this the loss function for the "per-token KL penalty from the SFT model" part? If so, I am wondering why there is a CE function combined?

Thanks a lot.

MXuer avatar Apr 16 '23 15:04 MXuer