lm-evaluation topic

List lm-evaluation repositories

latxa

31
Stars
0
Forks
31
Watchers

Latxa: An Open Language Model and Evaluation Suite for Basque

xFinder

181
Stars
7
Forks
181
Watchers

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

RAG-evaluation-harnesses

23
Stars
2
Forks
23
Watchers

An evaluation suite for Retrieval-Augmented Generation (RAG).

CiteME

48
Stars
5
Forks
48
Watchers

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.