Add support for saving intermediate results during vf-eval
Save Intermediate Results During vf-eval
This PR addresses issue #251 by adding support for saving intermediate results during evaluation and enabling interleaved reward computation.
Changes
-
Added configuration options to Environment class:
-
save_intermediate: Enable saving intermediate results during rollout -
interleave_rewards: Enable computing rewards after each rollout instead of batching
-
-
Modified
run_rolloutsmethod to:- Support saving intermediate results after each rollout
- Support interleaving reward computation
- Make both features optional and configurable
-
Added comprehensive tests in
test_intermediate_results.py
Testing
Added new test cases that verify:
- Intermediate results saving functionality
- Interleaved reward computation
- Configuration options
- Integration with existing evaluation methods
Notes
- The interleaved reward computation is optional as it's not fully compatible with some pairwise reward strategies
- Intermediate results are logged using the environment's logger, which can be customized by the user
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.
:white_check_mark: willccbb
:x: Your GitHub Username
Your GitHub Username seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.
nice! looks pretty good, updated to merge with latest main -- probably will make some other edits before merging, our logic for vf-eval outputs json saving has drifted a bit from make_dataset + ideally we bring these back in sync so that intermediate saving would handle vf-eval -s directly.