12-factor-agents icon indicating copy to clipboard operation
12-factor-agents copied to clipboard

Proposal: introduce evals to 12 factor app

Open Viktor286 opened this issue 4 months ago • 1 comments

This addition proposes to start adding evaluation tests (evals) to be part of the context of agent app.

General concept:

  • https://platform.openai.com/docs/guides/evals-design#multi-agent-architectures
  • https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals

The main idea of the PR is that evals convey context specific info that corresponds to the alignment of the agent.

Referencing the original 12-factor app, testing is still implicitly supported and encouraged through several of the factors:

Factor 10: Dev/Prod Parity This encourages keeping development, staging, and production environments as similar as possible. It implies that automated tests should run in environments that closely resemble production.

Factor 5: Build, Release, Run Since the build phase includes compiling code and running tests, a solid CI/CD pipeline that enforces testing fits naturally here.

Factor 12: Admin Processes You could technically run test scripts as one-off admin processes.

Viktor286 avatar Jul 27 '25 14:07 Viktor286

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jul 27 '25 14:07 CLAassistant