verl icon indicating copy to clipboard operation
verl copied to clipboard

[rollout] fix: some compatibility changes in agent loop and reward

Open pengwu22 opened this issue 1 month ago • 1 comments

What does this PR do?

Some compatibility changes, including

  • agent_loop:
    • compatible with model without system prompt
    • compatible with other multi-modal model with processor available
  • reward:
    • allow override_config for huggingface model

Test

  • train Qwen VL and other internal multi-modal models with customized reward on agent loop
  • CI

Checklist Before Submitting

[!IMPORTANT] Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

pengwu22 avatar Nov 25 '25 23:11 pengwu22

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 25 '25 23:11 CLAassistant