evals topic
List
evals repositories
agentops
1.8k
Stars
170
Forks
Watchers
Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen
langfuse
5.7k
Stars
535
Forks
18
Watchers
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
vivaria
53
Stars
15
Forks
Watchers
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
rag-evaluator
21
Stars
13
Forks
Watchers
A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).