evals topic

List evals repositories

agentops

3.8k
Stars
336
Forks
42
Watchers

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI

langfuse

14.4k
Stars
1.3k
Forks
44
Watchers

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

vivaria

53
Stars
15
Forks
Watchers

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

rag-evaluator

21
Stars
13
Forks
Watchers

A library for evaluating Retrieval-Augmented Generation (RAG) systems (The traditional ways).

mastra

11.2k
Stars
529
Forks
44
Watchers

The TypeScript AI agent framework. ⚡ Assistants, RAG, observability. Supports any LLM: GPT-4, Claude, Gemini, Llama.