verl
verl copied to clipboard
velr for multi-turn without tool/interaction?
I want to train a multi-turn conversation llm with verl, but in the ppo check config it requires a tool or interaction config when I enable multi-turn. What are my options here? Should I create a vacuous tool? Or just train without multiturn
@Junyu-Kong If you do not have tool or interaction, then how do you do multi-turn conversation? If you just want to use async server mode instead of batch mode to do single-turn, then set as this:
actor_rollout_ref.rollout.name=sglang \
actor_rollout_ref.rollout.mode=async \
actor_rollout_ref.rollout.multi_turn.enable=False \
@wuxibin89 May I ask whether VeRL supports training with mixed batches that contain both tool-use samples and non-tool samples?