prm800k
prm800k copied to clipboard
Question about reward model evaluation metric
Thanks for this great work! I have one question about how you measure the performance of the reward model. You mentioned in section 2.1 that 'We evaluate a reward model by its ability to perform best-of-N search over uniformly sampled solutions from the generator'. I am curious about, why not directly calculate the reward model accuracy over the test set and use that as the metric?