Open-Assistant
Open-Assistant copied to clipboard

Published 20 hours ago •

Reame
Issues

per-token KL penalty from the SFT model while doing the PPO training

Open MXuer opened this issue 1 year ago • 0 comments

I can't find the part for "per-token KL penalty from the SFT model" during the PPO training in the file model/model_training/trainer_rl.py, maybe I missed something. Could you tell me how these two loss combined?
I found the loss function "PolyLoss" in the model/model_training/losses.py. Is this the loss function for the "per-token KL penalty from the SFT model" part? If so, I am wondering why there is a CE function combined?

Thanks a lot.

Apr 16 '23 15:04 MXuer