agent-evaluation topics

Mathematical benchmark exposing the massive performance gap between real agents and LLM wrappers. Rigorous multi-dimensional evaluation with statistical validation (95% CI, Cohen's h) and reproducible...

Cre4T3Tiv3

agent-architecture

agent-benchmark

agent-evaluation

agent-performance

coze-loop

5.3k

Stars

723

Forks

5.3k

Watchers

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to...

coze-dev

agent

agent-evaluation

agent-observability

agentops

awesome-ai-agent-testing

23

Stars

4

Forks

23

Watchers

🤖 A curated list of resources for testing AI agents - frameworks, methodologies, benchmarks, tools, and best practices for ensuring reliable, safe, and effective autonomous AI systems

chaosync-org

agent-evaluation

agentic-ai

ai-agents

ai-benchmark

eval-view

31

Stars

3

Forks

31

Watchers

Catch AI agent regressions before you ship. YAML test cases, golden baselines, execution tracing, cost tracking, CI integration. LangGraph, CrewAI, Anthropic, OpenAI.

hidai25

agent

agent-benchmark

agent-evaluation

agentic-ai