trl Why not create an example of using PPO to train a summarization model?

Why not create an example of using PPO to train a summarization model?

Open Bo396543018 opened this issue 1 year ago • 1 comments

Thank you for your excellent work. I have a question. I saw that you provided the script for training the RM model from the OpenAI paper 'Learning to Summarize from Human Feedback'. Why haven't you further trained PPO on that dataset.

Mar 09 '23 06:03 Bo396543018

Lack of time and people - feel free to try it!

Mar 09 '23 13:03 lvwerra

trl trl copied to clipboard

Why not create an example of using PPO to train a summarization model?

trl
trl copied to clipboard