agent-evaluation topic
List
agent-evaluation repositories
ai-agents-reality-check
48
Stars
0
Forks
48
Watchers
Mathematical benchmark exposing the massive performance gap between real agents and LLM wrappers. Rigorous multi-dimensional evaluation with statistical validation (95% CI, Cohen's h) and reproducible...
coze-loop
5.2k
Stars
705
Forks
5.2k
Watchers
Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to...