opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Feature] Support pass@1 evaluation for multi predictions in MathEvaluator

Open DELEnomore opened this issue 3 months ago • 0 comments

Describe the feature

When using a Hugging Face model with the parameter num_return_sequences set greater than 1, the output column “predictions” becomes a list instead of a string. As a result, the MathEvaluator always returns an accuracy of 0, regardless of whether the prediction is correct. It would be beneficial if the score function could handle list-type inputs and evaluate pass@1 using multiple predictions, similar to the approach mentioned in the DeepSeek-R1 technical report.

Will you implement it?

  • [x] I would like to implement this feature and create a PR!

DELEnomore avatar Aug 28 '25 08:08 DELEnomore