evals topic

List evals repositories

agentops

1.8k
Stars
170
Forks
Watchers

Python SDK for agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen

langfuse

5.7k
Stars
535
Forks
18
Watchers

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

vivaria

53
Stars
15
Forks
Watchers

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

rag-evaluator

21
Stars
13
Forks
Watchers

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

mastra

4.2k
Stars
181
Forks
24
Watchers

the TypeScript AI agent framework