llm-evaluation-framework topics

promptfoo

4.5k

Stars

346

Forks

Watchers

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...

promptfoo

llm

llmops

prompt-engineering

prompt-testing

deepeval

3.3k

Stars

257

Forks

Watchers

The LLM Evaluation Framework

confident-ai

evaluation-framework

evaluation-metrics

llm-evaluation

llm-evaluation-framework

parea-sdk-py

74

Stars

6

Forks

Watchers

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

parea-ai

generative-ai

good-first-issue

llm

llm-eval

MixEval

219

Stars

32

Forks

Watchers

The official evaluation suite and dynamic data release for MixEval.

Psycoy

benchmark

benchmark-mixture

benchmarking-framework

benchmarking-suite

KIEval

32

Stars

2

Forks

Watchers

[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

zhuohaoyu

acl2024

explainable-ai

llm

llm-evaluation

fm-leaderboarder

18

Stars

5

Forks

Watchers

FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts

aws-samples

llm-benchmarking

llm-evaluation

llm-evaluation-framework