trl icon indicating copy to clipboard operation
trl copied to clipboard

Why not create an example of using PPO to train a summarization model?

Open Bo396543018 opened this issue 1 year ago • 1 comments

Thank you for your excellent work. I have a question. I saw that you provided the script for training the RM model from the OpenAI paper 'Learning to Summarize from Human Feedback'. Why haven't you further trained PPO on that dataset.

Bo396543018 avatar Mar 09 '23 06:03 Bo396543018

Lack of time and people - feel free to try it!

lvwerra avatar Mar 09 '23 13:03 lvwerra