agent-performance topic

List agent-performance repositories

ai-agents-reality-check

48
Stars
0
Forks
48
Watchers

Mathematical benchmark exposing the massive performance gap between real agents and LLM wrappers. Rigorous multi-dimensional evaluation with statistical validation (95% CI, Cohen's h) and reproducible...