llm-evaluation-framework topic

List llm-evaluation-framework repositories

promptfoo

6.9k
Stars
552
Forks
Watchers

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...

parea-sdk-py

74
Stars
6
Forks
Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

agentic_security

1.7k
Stars
218
Forks
1.7k
Watchers

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

MixEval

219
Stars
32
Forks
Watchers

The official evaluation suite and dynamic data release for MixEval.

KIEval

38
Stars
2
Forks
38
Watchers

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

fm-leaderboarder

19
Stars
5
Forks
19
Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

realign

18
Stars
1
Forks
18
Watchers

Realign is a testing and simulation framework for AI applications.