Bug when using `verl` with `sglang + LoRA`
System Info
System
ubuntu==20.04
RTX 3090 * 8
Environment
verl==0.7.0
sglang==0.5.2
torch==2.8.0
transformers==4.56.1
Information
- [ ] The official example scripts
- [x] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Description
When running the following script:
python3 -m verl.trainer.main_ppo \
--config-path="$CONFIG_PATH" \
--config-name='gsm8k_multiturn_grpo' \
algorithm.adv_estimator=grpo \
data.sampler.class_name="RandomCurriculumSampler" \
data.sampler.class_path="pkg://tests.utils.dataset.test_create_rl_sampler_on_cpu" \
data.dataloader_num_workers=0 \
data.max_prompt_length=1024 \
data.max_response_length=1024 \
data.train_batch_size=16 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.return_raw_chat=True \
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.lora_rank=8 \
actor_rollout_ref.model.lora_alpha=32 \
actor_rollout_ref.model.target_modules=all-linear \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=8 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.001 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=sglang \
actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
actor_rollout_ref.rollout.n=8 \
actor_rollout_ref.model.use_shm=True \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.logger='["console","wandb"]' \
trainer.project_name='gsm8k_async_rl' \
trainer.experiment_name='qwen3-4b_function_rm-gsm8k-sgl-multi-w-tool-verify-n16' \
trainer.n_gpus_per_node=8 \
trainer.nnodes=1 \
trainer.save_freq=-1 \
trainer.test_freq=20 \
data.train_files=$HOME/data/gsm8k/train.parquet \
data.val_files=$HOME/data/gsm8k/test.parquet \
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
trainer.total_epochs=2 $@
the following error occurred:
ray.exceptions.RayTaskError(IndexError): ray::TaskRunner.run()
File "/verl/verl/trainer/main_ppo.py", line 343, in run
trainer.fit()
File "/verl/verl/trainer/ppo/ray_trainer.py", line 1039, in fit
val_metrics = self._validate()
File "/verl/verl/trainer/ppo/ray_trainer.py", line 587, in _validate
test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded)
File "/verl/verl/single_controller/ray/base.py", line 48, in __call__
output = ray.get(output)
ray.exceptions.RayTaskError(IndexError): ray::WorkerDict.actor_rollout_generate_sequences()
File "/verl/verl/single_controller/ray/base.py", line 700, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
File "/verl/verl/single_controller/base/decorator.py", line 442, in inner
return func(*args, **kwargs)
File "/verl/verl/utils/transferqueue_utils.py", line 199, in dummy_inner
return func(*args, **kwargs)
File "/verl/verl/utils/profiler/profile.py", line 256, in wrapper
return func(self_instance, *args, **kwargs_inner)
File "/verl/verl/workers/fsdp_workers.py", line 920, in generate_sequences
loop.run_until_complete(self.rollout_mode())
File "/verl/verl/workers/fsdp_workers.py", line 716, in rollout_mode
await self.rollout.update_weights(per_tensor_param, peft_config=peft_config, base_sync_done=self.base_sync_done)
File "/verl/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 1525, in update_weights
await sgl_update_weights(
File "/verl/sglangorg/python/sglang/srt/weight_sync/utils.py", line 58, in update_weights
MultiprocessingSerializer.serialize(
File "/verl/sglangorg/python/sglang/srt/utils.py", line 1856, in serialize
ForkingPickler(buf).dump(obj)
File "/verl/sglangorg/python/sglang/srt/patch_torch.py", line 42, in _reduce_tensor_modified
output_args = _modify_tuple(
File "/verl/sglangorg/python/sglang/srt/patch_torch.py", line 71, in _modify_tuple
return *t[:index], modifier(t[index]), *t[index + 1 :]
IndexError: tuple index out of range
This issue occurs when using verl with sglang and LoRA (actor_rollout_ref.model.lora_rank > 0).
It seems to be related to tensor serialization in sglang/srt/patch_torch.py — specifically in the _modify_tuple function where an IndexError arises from accessing an invalid tuple index.
Expected behavior
Please add official support for using verl together with sglang under LoRA configuration.
Currently, the weight synchronization mechanism in verl does not appear to handle LoRA adapters correctly when sglang is used as the rollout backend.
Could you please:
- Confirm whether LoRA is supported with the sglang rollout in
verl? - Suggest a workaround or fix?
same bug
System Info
System
ubuntu==20.04 RTX 3090 * 8Environment
verl==0.7.0 sglang==0.5.2 torch==2.8.0 transformers==4.56.1Information
- [ ] The official example scripts[x] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...)[ ] My own task or dataset (give details below)Reproduction
Description
When running the following script:
python3 -m verl.trainer.main_ppo
--config-path="$CONFIG_PATH"
--config-name='gsm8k_multiturn_grpo'
algorithm.adv_estimator=grpo
data.sampler.class_name="RandomCurriculumSampler"
data.sampler.class_path="pkg://tests.utils.dataset.test_create_rl_sampler_on_cpu"
data.dataloader_num_workers=0
data.max_prompt_length=1024
data.max_response_length=1024
data.train_batch_size=16
data.filter_overlong_prompts=True
data.truncation='error'
data.return_raw_chat=True
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct
actor_rollout_ref.actor.optim.lr=1e-6
actor_rollout_ref.model.lora_rank=8
actor_rollout_ref.model.lora_alpha=32
actor_rollout_ref.model.target_modules=all-linear
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.actor.ppo_mini_batch_size=8
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8
actor_rollout_ref.actor.use_kl_loss=True
actor_rollout_ref.actor.kl_loss_coef=0.001
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.entropy_coeff=0
actor_rollout_ref.model.enable_gradient_checkpointing=True
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8
actor_rollout_ref.rollout.tensor_model_parallel_size=2
actor_rollout_ref.rollout.name=sglang
actor_rollout_ref.rollout.gpu_memory_utilization=0.7
actor_rollout_ref.rollout.n=8
actor_rollout_ref.model.use_shm=True
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8
actor_rollout_ref.ref.fsdp_config.param_offload=True
algorithm.use_kl_in_reward=False
trainer.critic_warmup=0
trainer.logger='["console","wandb"]'
trainer.project_name='gsm8k_async_rl'
trainer.experiment_name='qwen3-4b_function_rm-gsm8k-sgl-multi-w-tool-verify-n16'
trainer.n_gpus_per_node=8
trainer.nnodes=1
trainer.save_freq=-1
trainer.test_freq=20
data.train_files=$HOME/data/gsm8k/train.parquet
data.val_files=$HOME/data/gsm8k/test.parquet
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml"
trainer.total_epochs=2 $@ the following error occurred:ray.exceptions.RayTaskError(IndexError): ray::TaskRunner.run() File "/verl/verl/trainer/main_ppo.py", line 343, in run trainer.fit() File "/verl/verl/trainer/ppo/ray_trainer.py", line 1039, in fit val_metrics = self._validate() File "/verl/verl/trainer/ppo/ray_trainer.py", line 587, in _validate test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded) File "/verl/verl/single_controller/ray/base.py", line 48, in call output = ray.get(output) ray.exceptions.RayTaskError(IndexError): ray::WorkerDict.actor_rollout_generate_sequences() File "/verl/verl/single_controller/ray/base.py", line 700, in func return getattr(self.worker_dict[key], name)(*args, **kwargs) File "/verl/verl/single_controller/base/decorator.py", line 442, in inner return func(*args, **kwargs) File "/verl/verl/utils/transferqueue_utils.py", line 199, in dummy_inner return func(*args, **kwargs) File "/verl/verl/utils/profiler/profile.py", line 256, in wrapper return func(self_instance, *args, **kwargs_inner) File "/verl/verl/workers/fsdp_workers.py", line 920, in generate_sequences loop.run_until_complete(self.rollout_mode()) File "/verl/verl/workers/fsdp_workers.py", line 716, in rollout_mode await self.rollout.update_weights(per_tensor_param, peft_config=peft_config, base_sync_done=self.base_sync_done) File "/verl/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 1525, in update_weights await sgl_update_weights( File "/verl/sglangorg/python/sglang/srt/weight_sync/utils.py", line 58, in update_weights MultiprocessingSerializer.serialize( File "/verl/sglangorg/python/sglang/srt/utils.py", line 1856, in serialize ForkingPickler(buf).dump(obj) File "/verl/sglangorg/python/sglang/srt/patch_torch.py", line 42, in _reduce_tensor_modified output_args = _modify_tuple( File "/verl/sglangorg/python/sglang/srt/patch_torch.py", line 71, in _modify_tuple return *t[:index], modifier(t[index]), *t[index + 1 :] IndexError: tuple index out of range This issue occurs when using
verlwithsglangand LoRA (actor_rollout_ref.model.lora_rank > 0). It seems to be related to tensor serialization insglang/srt/patch_torch.py— specifically in the_modify_tuplefunction where an IndexError arises from accessing an invalid tuple index.Expected behavior
Please add official support for using
verltogether withsglangunder LoRA configuration. Currently, the weight synchronization mechanism inverldoes not appear to handle LoRA adapters correctly whensglangis used as the rollout backend.Could you please:
- Confirm whether LoRA is supported with the sglang rollout in
verl?- Suggest a workaround or fix?
Hi,Did you solved the bug?
Hi,Did you solved the bug?
No, not yet. But I found that the issue seems to be caused by LoRA weights being kept on the CPU — see this line. That appears to trigger the problem.
Hi,Did you solved the bug?
No, not yet. But I found that the issue seems to be caused by LoRA weights being kept on the CPU — see this line. That appears to trigger the problem.
maybe param_offload=True triggers the problems?
Hi,Did you solved the bug?
No, not yet. But I found that the issue seems to be caused by LoRA weights being kept on the CPU — see this line. That appears to trigger the problem.
maybe param_offload=True triggers the problems?
I just tested it — even after setting actor_rollout_ref.ref.fsdp_config.param_offload=False in the original script, the same error still occurs.
Hi,Did you solved the bug?
No, not yet. But I found that the issue seems to be caused by LoRA weights being kept on the CPU — see this line. That appears to trigger the problem.
maybe param_offload=True triggers the problems?
I just tested it — even after setting
actor_rollout_ref.ref.fsdp_config.param_offload=Falsein the original script, the same error still occurs.
bad news
I have met the same problem