llm-eval topic
giskard
🐢 Open-Source Evaluation & Testing for LLMs and ML models
phoenix
AI Observability & Evaluation
uptrain
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform ro...
promptfoo
Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models wit...
athina-evals
Python SDK for running evaluations on LLM generated responses
just-eval
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
rulm-sbs2
Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat
parea-sdk-py
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
ragrank
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.