verl icon indicating copy to clipboard operation
verl copied to clipboard

velr for multi-turn without tool/interaction?

Open Junyu-Kong opened this issue 5 months ago • 2 comments

I want to train a multi-turn conversation llm with verl, but in the ppo check config it requires a tool or interaction config when I enable multi-turn. What are my options here? Should I create a vacuous tool? Or just train without multiturn

Junyu-Kong avatar Jul 07 '25 08:07 Junyu-Kong

@Junyu-Kong If you do not have tool or interaction, then how do you do multi-turn conversation? If you just want to use async server mode instead of batch mode to do single-turn, then set as this:

    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.rollout.mode=async \
    actor_rollout_ref.rollout.multi_turn.enable=False \

wuxibin89 avatar Jul 08 '25 02:07 wuxibin89

@wuxibin89 May I ask whether VeRL supports training with mixed batches that contain both tool-use samples and non-tool samples?

HJYao00 avatar Nov 18 '25 08:11 HJYao00