llm-as-evaluator topics

langtest

545

Stars

50

Forks

545

Watchers

Deliver safe & effective language models

Pacific-AI-Corp

benchmarks

ethics-in-ai

large-language-models

llm-test

LLM-IR-Bias-Fairness-Survey

58

Stars

3

Forks

58

Watchers

This is the repo for the survey of Bias and Fairness in IR with LLMs.

KID-22

bias

chatgpt

fairness

information-retrieval

xFinder

181

Stars

7

Forks

181

Watchers

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

IAAR-Shanghai

evaluation

gpt

llm

xfinder

Timo

24

Stars

3

Forks

24

Watchers

Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)

zhaochen0110

colm2024

llm-as-a-judge

llm-as-evaluator

llms

prometheus-eval

1.0k

Stars

66

Forks

1.0k

Watchers

Evaluate your LLM's response with Prometheus and GPT4 💯

prometheus-eval

evaluation

gpt4

litellm

llm

cobbler

21

Stars

2

Forks

21

Watchers

Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"

minnesotanlp

bias

bias-detection

evaluation

llm