Open-Assistant
Open-Assistant copied to clipboard
Any plan to re-implement the reward model training from InstructGPT?
Looks like the reward model is RankGen at the moment? But I thought we are going to have InstructGPT like K pair-wise ranking loss instead of this prefix/suffix one. Is there a plan to follow up on that? I didn't see an existing Issue or PR so wanna ask.
K pair-wise loss was at least discussed in the beginning. We definitely would love to experiment and compare different reward-models. If you are (or anyone else is) interested in working on this please join the OA discord and ping @sanagno (Sortiris on discord) or me.
Thanks Yes, I would be interested. Need a few more days to wrap up my current work, but I'll let you guys know when I'm ready to scope out the work and implement.
I will close this as an answered question but if you are interested in participating in ML development feel free to comment on an issue or come and discuss with us in the volunteers channel of the Discord server :)
Sounds good. About to reach out this weekend.
On Fri, Feb 24, 2023 at 4:50 PM Oliver Stanley @.***> wrote:
Closed #1584 https://github.com/LAION-AI/Open-Assistant/issues/1584 as completed.
— Reply to this email directly, view it on GitHub https://github.com/LAION-AI/Open-Assistant/issues/1584#event-8606817845, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCHB74MTB2JNOUZZSTLTDLWZFJOHANCNFSM6AAAAAAU4PQP5Y . You are receiving this because you authored the thread.Message ID: @.***>