opencompass
opencompass copied to clipboard
[Feature] Support pass@1 evaluation for multi predictions in MathEvaluator
Describe the feature
When using a Hugging Face model with the parameter num_return_sequences set greater than 1, the output column “predictions” becomes a list instead of a string. As a result, the MathEvaluator always returns an accuracy of 0, regardless of whether the prediction is correct. It would be beneficial if the score function could handle list-type inputs and evaluate pass@1 using multiple predictions, similar to the approach mentioned in the DeepSeek-R1 technical report.
Will you implement it?
- [x] I would like to implement this feature and create a PR!