llm-as-a-judge topic

List llm-as-a-judge repositories
trafficstars

xFinder

176
Stars
7
Forks
Watchers

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

xVerify

138
Stars
7
Forks
Watchers

xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

ineqmath

52
Stars
7
Forks
Watchers

Solving Inequality Proofs with Large Language Models.

circle-guard-bench

44
Stars
2
Forks
Watchers

First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)

docling-sdg

35
Stars
13
Forks
Watchers

A set of tools to create synthetically-generated data from documents

Themis

20
Stars
1
Forks
Watchers

The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability.

CuREV

18
Stars
3
Forks
18
Watchers

Harnessing Large Language Models for Curated Code Reviews