llm-eval topic
giskard
🐢 Open-Source Evaluation & Testing for ML & LLM systems
phoenix
AI Observability & Evaluation
uptrain
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
promptfoo
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command...
athina-evals
Python SDK for running evaluations on LLM generated responses
just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
rulm-sbs2
Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat
parea-sdk-py
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
ragrank
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
prompto
An open source library for asynchronous querying of LLM endpoints