Jie Cheng
Results
1
issues of
Jie Cheng
This pr serves a similar purpose as [this](https://github.com/RLHFlow/RLHF-Reward-Modeling/pull/48), in order to increase the speed of prm evaluation. But instead of modifying the content of the conversation (which can lead to...