[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
IAAR-Shanghai
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations