evals
evals copied to clipboard
Exact Match template
Describe the feature or improvement you're requesting
The documentation in eval-templates.md describes basic/match.py
as Match: any([b.startswith(a) for b in B])
"[f]or a model completion a
and a reference list of correct answers B
. This is a poor fit for arithmetic and other algorithmic tasks, where we want the model response to exactly match some ideal answer, i.e., any([b == a for b in B])
Additional context
No response
For arithmetic and other complex tasks, it is recommended to ask the model for reasoning before answering. For such cases, proper prompt instructions are used, like asking the model to enclose the final answer in square brackets and formatting the ideal answer like [5]
.