llm-as-a-judge topic
xFinder
[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
ineqmath
Solving Inequality Proofs with Large Language Models.
circle-guard-bench
First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)
docling-sdg
A set of tools to create synthetically-generated data from documents
OmniVerifier
Generative Universal Verifier as Multimodal Meta-Reasoner
Themis
The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.
CuREV
Harnessing Large Language Models for Curated Code Reviews