llm-evaluation topic

List llm-evaluation repositories

Awesome-LLM-in-Social-Science

158
Stars
8
Forks
Watchers

Awesome papers involving LLMs in Social Science.

promptfoo

3.1k
Stars
205
Forks
Watchers

Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models wit...

continuous-eval

337
Stars
16
Forks
Watchers

Open-Source Evaluation for GenAI Application Pipelines

DCR-consistency

19
Stars
2
Forks
Watchers

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models

CommonGen-Eval

80
Stars
3
Forks
Watchers

Evaluating LLMs with CommonGen-Lite

athina-evals

140
Stars
11
Forks
Watchers

Python SDK for running evaluations on LLM generated responses

just-eval

63
Stars
4
Forks
Watchers

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

agenta

908
Stars
158
Forks
Watchers

The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.

leaf-playground

21
Stars
0
Forks
Watchers

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent ac...