verl
verl copied to clipboard
[rollout] fix: some compatibility changes in agent loop and reward
What does this PR do?
Some compatibility changes, including
agent_loop:- compatible with model without system prompt
- compatible with other multi-modal model with processor available
reward:- allow override_config for huggingface model
Test
- train Qwen VL and other internal multi-modal models with customized reward on agent loop
- CI
Checklist Before Submitting
[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
- [x] Read the Contribute Guide.
- [x] Apply pre-commit checks:
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always - [x] Add / Update the documentation.
- [x] Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in the
ci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)