verifiers icon indicating copy to clipboard operation
verifiers copied to clipboard

Add Terminal-Bench sandbox example environment

Open willccbb opened this issue 4 months ago • 0 comments

Summary

  • add a configurable SandboxEnv wrapper for Terminal-Bench tasks that stages assets in a fresh sandbox and runs the official tests during post-rollout
  • cache the test outcome for scoring via a simple rubric and ship a single-example dataset for the requested task id
  • document the environment and declare its dependencies

Testing

  • uv run ruff check environments/terminal_bench/terminal_bench.py

https://chatgpt.com/codex/tasks/task_e_68ea1a482a388326932b1fb2fa1d666d

willccbb avatar Oct 11 '25 09:10 willccbb