Open-Assistant Any plan to re-implement the reward model training from InstructGPT?

Any plan to re-implement the reward model training from InstructGPT?

Open ethanyanjiali opened this issue 2 years ago • 2 comments

Looks like the reward model is RankGen at the moment? But I thought we are going to have InstructGPT like K pair-wise ranking loss instead of this prefix/suffix one. Is there a plan to follow up on that? I didn't see an existing Issue or PR so wanna ask.

Feb 15 '23 07:02 ethanyanjiali

K pair-wise loss was at least discussed in the beginning. We definitely would love to experiment and compare different reward-models. If you are (or anyone else is) interested in working on this please join the OA discord and ping @sanagno (Sortiris on discord) or me.

Feb 15 '23 08:02 andreaskoepf

Thanks Yes, I would be interested. Need a few more days to wrap up my current work, but I'll let you guys know when I'm ready to scope out the work and implement.

Feb 16 '23 06:02 ethanyanjiali

I will close this as an answered question but if you are interested in participating in ML development feel free to comment on an issue or come and discuss with us in the volunteers channel of the Discord server :)

Feb 25 '23 00:02 olliestanley

Sounds good. About to reach out this weekend.

On Fri, Feb 24, 2023 at 4:50 PM Oliver Stanley @.***> wrote:

Closed #1584 https://github.com/LAION-AI/Open-Assistant/issues/1584 as completed.

— Reply to this email directly, view it on GitHub https://github.com/LAION-AI/Open-Assistant/issues/1584#event-8606817845, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCHB74MTB2JNOUZZSTLTDLWZFJOHANCNFSM6AAAAAAU4PQP5Y . You are receiving this because you authored the thread.Message ID: @.***>

Feb 25 '23 01:02 ethanyanjiali

Open-Assistant Open-Assistant copied to clipboard

Any plan to re-implement the reward model training from InstructGPT?

Open-Assistant
Open-Assistant copied to clipboard