verifiers icon indicating copy to clipboard operation
verifiers copied to clipboard

Add support for saving intermediate results during vf-eval

Open bsagevedant opened this issue 6 months ago • 2 comments

Save Intermediate Results During vf-eval

This PR addresses issue #251 by adding support for saving intermediate results during evaluation and enabling interleaved reward computation.

Changes

  • Added configuration options to Environment class:

    • save_intermediate: Enable saving intermediate results during rollout
    • interleave_rewards: Enable computing rewards after each rollout instead of batching
  • Modified run_rollouts method to:

    • Support saving intermediate results after each rollout
    • Support interleaving reward computation
    • Make both features optional and configurable
  • Added comprehensive tests in test_intermediate_results.py

Testing

Added new test cases that verify:

  • Intermediate results saving functionality
  • Interleaved reward computation
  • Configuration options
  • Integration with existing evaluation methods

Notes

  • The interleaved reward computation is optional as it's not fully compatible with some pairwise reward strategies
  • Intermediate results are logged using the environment's logger, which can be customized by the user

bsagevedant avatar Sep 24 '25 03:09 bsagevedant

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

:white_check_mark: willccbb
:x: Your GitHub Username


Your GitHub Username seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Sep 24 '25 03:09 CLAassistant

nice! looks pretty good, updated to merge with latest main -- probably will make some other edits before merging, our logic for vf-eval outputs json saving has drifted a bit from make_dataset + ideally we bring these back in sync so that intermediate saving would handle vf-eval -s directly.

willccbb avatar Sep 30 '25 07:09 willccbb