verl icon indicating copy to clipboard operation
verl copied to clipboard

Did vllm supports multiturn rollout?

Open Kedaya66 opened this issue 2 months ago • 1 comments

System Info

python3 -m verl.trainer.main_ppo
algorithm.adv_estimator=grpo
data.train_files=''
data.val_files=''
reward_model.reward_manager=collabllm
+reward_model.reward_kwargs.metric_weights.intention=1
+reward_model.reward_kwargs.metric_weights.token_amount=-0.0001
+reward_model.reward_kwargs.llm_judge_kwargs.model=azure/gpt-4.1
+reward_model.reward_kwargs.llm_judge_kwargs.max_tokens=2048
+reward_model.reward_kwargs.llm_judge_kwargs.temperature=0
data.train_batch_size=$TRAIN_BATCH_SIZE
data.val_batch_size=$MICRO_BATCH_SIZE
data.train_max_samples=32
data.val_max_samples=32
data.max_prompt_length=2048
data.max_response_length=2048
data.filter_overlong_prompts=True
data.truncation='error'
actor_rollout_ref.model.path=$DEBUG_MODEL
actor_rollout_ref.actor.strategy=fsdp2
actor_rollout_ref.rollout.dtype=bfloat16
actor_rollout_ref.actor.fsdp_config.model_dtype=bf16
actor_rollout_ref.ref.fsdp_config.model_dtype=bf16
actor_rollout_ref.actor.optim.lr=1e-6
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.actor.ppo_mini_batch_size=$MICRO_BATCH_SIZE
actor_rollout_ref.rollout.tensor_model_parallel_size=$GPU_NUMS
actor_rollout_ref.actor.ulysses_sequence_parallel_size=2
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1
actor_rollout_ref.actor.use_kl_loss=True
actor_rollout_ref.actor.kl_loss_coef=0.001
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.entropy_coeff=0
actor_rollout_ref.model.lora_rank=64
actor_rollout_ref.model.lora_alpha=64
actor_rollout_ref.model.target_modules=all-linear
actor_rollout_ref.actor.fsdp_config.fsdp_size=-1
actor_rollout_ref.model.enable_gradient_checkpointing=False
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.ref.fsdp_config.param_offload=False
actor_rollout_ref.rollout.name=sglang
actor_rollout_ref.rollout.multi_turn=true
actor_rollout_ref.rollout.gpu_memory_utilization=0.65
actor_rollout_ref.rollout.n=$ROLL_NUM
actor_rollout_ref.rollout.load_format=safetensors
actor_rollout_ref.rollout.layered_summon=True
actor_rollout_ref.rollout.temperature=1.0
actor_rollout_ref.rollout.enable_chunked_prefill=False
+actor_rollout_ref.rollout.multi_turn.format=hermes
+actor_rollout_ref.rollout.multi_turn.max_user_turns=3
+actor_rollout_ref.rollout.multi_turn.max_assistant_turns=4
+actor_rollout_ref.rollout.multi_turn.num_repeat_rollouts=1
+actor_rollout_ref.rollout.agent.agent_loop_config_path=$AGENTLOOP_CONFIG_PATH
algorithm.use_kl_in_reward=False
trainer.critic_warmup=0
trainer.logger='["wandb"]'
actor_rollout_ref.rollout.trace.backend=weave
actor_rollout_ref.rollout.trace.token2text=True
trainer.project_name=$PRJ_NAME
trainer.experiment_name=$EXP_NAME
trainer.nnodes=1
trainer.n_gpus_per_node=$GPU_NUMS
trainer.save_freq=30
trainer.test_freq=30
trainer.total_epochs=20
custom_reward_function.path=recipe/collabllm_jiuan/reward_function.py
custom_reward_function.name=conversation_level_reward_func
+actor_rollout_ref.rollout.multi_turn.interaction_config_path="$PROJECT_DIR/recipe/collabllm/config/collabllm_interaction_config.yaml"
trainer.log_val_generations=1023
trainer.val_before_train=False
trainer.default_local_dir=$SAVE_DIR
trainer.resume_from_path=$RESUME_PATH
+data.apply_chat_template_kwargs.enable_thinking=False

vllm not perform rollout normally?

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [x] My own task or dataset (give details below)

Reproduction

1

Expected behavior

1

Kedaya66 avatar Nov 14 '25 13:11 Kedaya66

I have the same question. There's also a mutltiurn rollout script in examples/sglang_multiturn/run_qwen2.5-3b_gsm8k_multiturn_vllm_fsdp.sh but I don't see how the relevant code handles multi-turn generation.

FabianSchuetze avatar Nov 18 '25 10:11 FabianSchuetze