llm-evaluation topic
Awesome-LLM-in-Social-Science
Awesome papers involving LLMs in Social Science.
promptfoo
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...
continuous-eval
Data-Driven Evaluation for LLM-Powered Applications
DCR-consistency
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
deepeval
The LLM Evaluation Framework
athina-evals
Python SDK for running evaluations on LLM generated responses
just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
agenta
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
leaf-playground
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent ac...