llm-evaluation topic
Awesome-LLM-in-Social-Science
Awesome papers involving LLMs in Social Science.
promptfoo
Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models wit...
continuous-eval
Open-Source Evaluation for GenAI Application Pipelines
DCR-consistency
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
deepeval
The LLM Evaluation Framework
athina-evals
Python SDK for running evaluations on LLM generated responses
just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
agenta
The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
leaf-playground
A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent ac...