evals topic
List
evals repositories
trafficstars
agentops
5.1k
Stars
490
Forks
5.1k
Watchers
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and Ca...
langfuse
14.4k
Stars
1.3k
Forks
Watchers
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
vivaria
53
Stars
15
Forks
Watchers
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
rag-evaluator
21
Stars
13
Forks
Watchers
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).