verifiers
verifiers copied to clipboard
Adding rollout filtering to trainer
Description
This PR adds rollout filtering to the Trainer, following the RAGEN approach. It helps prevent model collapse by retaining only the top N% of tasks with the highest outcome uncertainty.
Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update
- [ ] Test improvement
Testing
- [x] All existing tests pass
- [ ] New tests have been added to cover the changes
- [x] Tests have been run locally with
python -m pytest tests/
Test Coverage
- Current coverage: 100%
- Coverage after changes: 100%
Checklist
- [x] My code follows the style guidelines of this project
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] Any dependent changes have been merged and published
Additional Notes
hey @willccbb, given the current uptades on verifiers, this feature can still be considered desirable to be included in the Trainer? Just to know if I should close the PR, otherwise I will try to make it compatible with the latest version.