llm-foundry Provide a metric that uses Math-Verify

Provide a metric that uses Math-Verify

Open gsganden opened this issue 7 months ago • 0 comments

🚀 Feature Request

Provide a metric that uses Math-Verify to parse and compare mathematical expressions with more flexibility than InContextLearningGenerationExactMatchAccuracy.

Motivation

https://huggingface.co/blog/math_verify_leaderboard reports that overly simple methods for evaluating LLM math performance can give very misleading results, which Math-Verify addresses.

[Optional] Implementation

Create a MathVerifyAccuracy class that inherits from InContextLearningMetric, in llmfoundry/eval/metrics/nlp.py or perhaps a new llmfoundry/eval/metrics/math.py. The implementation of that class is relatively straightforward, and I would be happy to carry it out if desired.

Additional context

Mar 03 '25 21:03 gsganden

llm-foundry llm-foundry copied to clipboard

Provide a metric that uses Math-Verify

🚀 Feature Request

Motivation

[Optional] Implementation

Additional context

llm-foundry
llm-foundry copied to clipboard