llm-evaluation-framework topic

List llm-evaluation-framework repositories

promptfoo

4.5k
Stars
346
Forks
Watchers

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...

parea-sdk-py

74
Stars
6
Forks
Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

MixEval

219
Stars
32
Forks
Watchers

The official evaluation suite and dynamic data release for MixEval.

KIEval

32
Stars
2
Forks
Watchers

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

fm-leaderboarder

18
Stars
5
Forks
Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts