[rollout] fix: some compatibility changes in agent loop and reward

Open pengwu22 opened this issue 1 month ago • 1 comments

Some compatibility changes, including

agent_loop:
- compatible with model without system prompt
- compatible with other multi-modal model with processor available
reward:
- allow override_config for huggingface model

train Qwen VL and other internal multi-modal models with customized reward on agent loop
CI

[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

[x] Read the Contribute Guide.
[x] Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
[x] Add / Update the documentation.
[x] Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
[x] Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

Nov 25 '25 23:11 pengwu22

All committers have signed the CLA.

Nov 25 '25 23:11 CLAassistant