llm-evaluation topic

List llm-evaluation repositories

Awesome-LLM-in-Social-Science

233
Stars
13
Forks
Watchers

Awesome papers involving LLMs in Social Science.

promptfoo

4.5k
Stars
346
Forks
Watchers

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...

continuous-eval

436
Stars
28
Forks
Watchers

Data-Driven Evaluation for LLM-Powered Applications

DCR-consistency

21
Stars
3
Forks
Watchers

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models

CommonGen-Eval

84
Stars
3
Forks
Watchers

Evaluating LLMs with CommonGen-Lite

athina-evals

210
Stars
12
Forks
Watchers

Python SDK for running evaluations on LLM generated responses

just-eval

74
Stars
6
Forks
Watchers

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

agenta

1.2k
Stars
182
Forks
Watchers

The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.

leaf-playground

23
Stars
0
Forks
Watchers

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent ac...