prm800k icon indicating copy to clipboard operation
prm800k copied to clipboard

Question about reward model evaluation metric

Open waterhorse1 opened this issue 8 months ago • 0 comments

Thanks for this great work! I have one question about how you measure the performance of the reward model. You mentioned in section 2.1 that 'We evaluate a reward model by its ability to perform best-of-N search over uniformly sampled solutions from the generator'. I am curious about, why not directly calculate the reward model accuracy over the test set and use that as the metric?

waterhorse1 avatar Nov 01 '23 22:11 waterhorse1