verifiers
verifiers copied to clipboard
[FEAT] new --save-dataset-cache arg
Description
added new save-dataset-cache arg to eval.py centralizing information across verifiers usage or at least providing the option to i think is generally good
Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update
- [ ] Test improvement
Testing
- [x] All existing tests pass when running
uv run pytestlocally. - [ ] New tests have been added to cover the changes
Checklist
- [ ] My code follows the style guidelines of this project as outlined in AGENTS.md
- [x] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] Any dependent changes have been merged and published
Additional Notes
new args:
vf-eval my-env -sc vf-eval my-env --save-dataset-cache defaults to using ~/.cache/verifiers/
christian@Christians-MacBook-Pro:~/.cache/verifiers$ ls
evals
christian@Christians-MacBook-Pro:~/.cache/verifiers$ ls evals
index.json mcp-env--gpt-4.1-mini
christian@Christians-MacBook-Pro:~/.cache/verifiers$ cat evals/index.json
{
"mcp-env--gpt-4.1-mini/82149437": {
"env": "mcp-env",
"model": "gpt-4.1-mini",
"timestamp": "2025-09-20T01:15:21.271069",
"path": "/Users/christian/.cache/verifiers/evals/mcp-env--gpt-4.1-mini/82149437",
"avg_reward": 0.0,
"num_examples": 1,
"rollouts_per_example": 1
}
}
christian@Christians-MacBook-Pro:~/.cache/verifiers$ ls evals/mcp-env--gpt-4.1-mini
82149437 bf98fd91
christian@Christians-MacBook-Pro:~/.cache/verifiers$ ls evals/mcp-env--gpt-4.1-mini/82149437
metadata.json results.jsonl
christian@Christians-MacBook-Pro:~/.cache/verifiers$