MOSS-RLHF
MOSS-RLHF copied to clipboard
The release of reward model training code?
It's an excellent paper and has a significant contribution to the entire alignment community. I wanted to inquire if you have any plans to open-source the training code for the reward model.
Thank you for your great supports to us! Because reward model training involves more methods, this part will be explained in the second part of the technical report, thank you for your support and recognition!
Thank you for your response. May I know if there is a specific timeline for the release of the second part of the technical report?
Probably in August or September of this year, thanks for your interest.