verifiers icon indicating copy to clipboard operation
verifiers copied to clipboard

Adding rollout filtering to trainer

Open LuanBrt opened this issue 3 months ago • 2 comments

Description

This PR adds rollout filtering to the Trainer, following the RAGEN approach. It helps prevent model collapse by retaining only the top N% of tasks with the highest outcome uncertainty.

Type of Change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [ ] Documentation update
  • [ ] Test improvement

Testing

  • [x] All existing tests pass
  • [ ] New tests have been added to cover the changes
  • [x] Tests have been run locally with python -m pytest tests/

Test Coverage

  • Current coverage: 100%
  • Coverage after changes: 100%

Checklist

  • [x] My code follows the style guidelines of this project
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [x] My changes generate no new warnings
  • [x] Any dependent changes have been merged and published

Additional Notes

LuanBrt avatar Sep 05 '25 19:09 LuanBrt

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 05 '25 19:09 CLAassistant

hey @willccbb, given the current uptades on verifiers, this feature can still be considered desirable to be included in the Trainer? Just to know if I should close the PR, otherwise I will try to make it compatible with the latest version.

LuanBrt avatar Oct 24 '25 19:10 LuanBrt