verl Bug when using `verl` with `sglang + LoRA`

System Info

System

ubuntu==20.04
RTX 3090 * 8

Environment

verl==0.7.0
sglang==0.5.2
torch==2.8.0
transformers==4.56.1

Information

[ ] The official example scripts
[x] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Description

When running the following script:

python3 -m verl.trainer.main_ppo \
    --config-path="$CONFIG_PATH" \
    --config-name='gsm8k_multiturn_grpo' \
    algorithm.adv_estimator=grpo \
    data.sampler.class_name="RandomCurriculumSampler" \
    data.sampler.class_path="pkg://tests.utils.dataset.test_create_rl_sampler_on_cpu" \
    data.dataloader_num_workers=0 \
    data.max_prompt_length=1024 \
    data.max_response_length=1024 \
    data.train_batch_size=16 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.return_raw_chat=True \
    actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.lora_rank=8 \
    actor_rollout_ref.model.lora_alpha=32 \
    actor_rollout_ref.model.target_modules=all-linear \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=8 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=sglang \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
    actor_rollout_ref.rollout.n=8 \
    actor_rollout_ref.model.use_shm=True \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger='["console","wandb"]' \
    trainer.project_name='gsm8k_async_rl' \
    trainer.experiment_name='qwen3-4b_function_rm-gsm8k-sgl-multi-w-tool-verify-n16' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=20 \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
  actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml" \
    trainer.total_epochs=2 $@

the following error occurred:

ray.exceptions.RayTaskError(IndexError): ray::TaskRunner.run()
  File "/verl/verl/trainer/main_ppo.py", line 343, in run
    trainer.fit()
  File "/verl/verl/trainer/ppo/ray_trainer.py", line 1039, in fit
    val_metrics = self._validate()
  File "/verl/verl/trainer/ppo/ray_trainer.py", line 587, in _validate
    test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded)
  File "/verl/verl/single_controller/ray/base.py", line 48, in __call__
    output = ray.get(output)
ray.exceptions.RayTaskError(IndexError): ray::WorkerDict.actor_rollout_generate_sequences()
  File "/verl/verl/single_controller/ray/base.py", line 700, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "/verl/verl/single_controller/base/decorator.py", line 442, in inner
    return func(*args, **kwargs)
  File "/verl/verl/utils/transferqueue_utils.py", line 199, in dummy_inner
    return func(*args, **kwargs)
  File "/verl/verl/utils/profiler/profile.py", line 256, in wrapper
    return func(self_instance, *args, **kwargs_inner)
  File "/verl/verl/workers/fsdp_workers.py", line 920, in generate_sequences
    loop.run_until_complete(self.rollout_mode())
  File "/verl/verl/workers/fsdp_workers.py", line 716, in rollout_mode
    await self.rollout.update_weights(per_tensor_param, peft_config=peft_config, base_sync_done=self.base_sync_done)
  File "/verl/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 1525, in update_weights
    await sgl_update_weights(
  File "/verl/sglangorg/python/sglang/srt/weight_sync/utils.py", line 58, in update_weights
    MultiprocessingSerializer.serialize(
  File "/verl/sglangorg/python/sglang/srt/utils.py", line 1856, in serialize
    ForkingPickler(buf).dump(obj)
  File "/verl/sglangorg/python/sglang/srt/patch_torch.py", line 42, in _reduce_tensor_modified
    output_args = _modify_tuple(
  File "/verl/sglangorg/python/sglang/srt/patch_torch.py", line 71, in _modify_tuple
    return *t[:index], modifier(t[index]), *t[index + 1 :]
IndexError: tuple index out of range

This issue occurs when using verl with sglang and LoRA (actor_rollout_ref.model.lora_rank > 0). It seems to be related to tensor serialization in sglang/srt/patch_torch.py — specifically in the _modify_tuple function where an IndexError arises from accessing an invalid tuple index.

Expected behavior

Please add official support for using verl together with sglang under LoRA configuration. Currently, the weight synchronization mechanism in verl does not appear to handle LoRA adapters correctly when sglang is used as the rollout backend.

Could you please:

Confirm whether LoRA is supported with the sglang rollout in verl?
Suggest a workaround or fix?

Nov 10 '25 02:11 CatWongCoi

same bug

Nov 13 '25 08:11 Kedaya66

System Info

System
ubuntu==20.04
RTX 3090 * 8
Environment
verl==0.7.0
sglang==0.5.2
torch==2.8.0
transformers==4.56.1
Information

[ ] The official example scripts[x] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)[ ] My own task or dataset (give details below)

Reproduction

Description

When running the following script:

python3 -m verl.trainer.main_ppo
--config-path="$CONFIG_PATH"
--config-name='gsm8k_multiturn_grpo'
algorithm.adv_estimator=grpo
data.sampler.class_name="RandomCurriculumSampler"
data.sampler.class_path="pkg://tests.utils.dataset.test_create_rl_sampler_on_cpu"
data.dataloader_num_workers=0
data.max_prompt_length=1024
data.max_response_length=1024
data.train_batch_size=16
data.filter_overlong_prompts=True
data.truncation='error'
data.return_raw_chat=True
actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct
actor_rollout_ref.actor.optim.lr=1e-6
actor_rollout_ref.model.lora_rank=8
actor_rollout_ref.model.lora_alpha=32
actor_rollout_ref.model.target_modules=all-linear
actor_rollout_ref.model.use_remove_padding=True
actor_rollout_ref.actor.ppo_mini_batch_size=8
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8
actor_rollout_ref.actor.use_kl_loss=True
actor_rollout_ref.actor.kl_loss_coef=0.001
actor_rollout_ref.actor.kl_loss_type=low_var_kl
actor_rollout_ref.actor.entropy_coeff=0
actor_rollout_ref.model.enable_gradient_checkpointing=True
actor_rollout_ref.actor.fsdp_config.param_offload=False
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8
actor_rollout_ref.rollout.tensor_model_parallel_size=2
actor_rollout_ref.rollout.name=sglang
actor_rollout_ref.rollout.gpu_memory_utilization=0.7
actor_rollout_ref.rollout.n=8
actor_rollout_ref.model.use_shm=True
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8
actor_rollout_ref.ref.fsdp_config.param_offload=True
algorithm.use_kl_in_reward=False
trainer.critic_warmup=0
trainer.logger='["console","wandb"]'
trainer.project_name='gsm8k_async_rl'
trainer.experiment_name='qwen3-4b_function_rm-gsm8k-sgl-multi-w-tool-verify-n16'
trainer.n_gpus_per_node=8
trainer.nnodes=1
trainer.save_freq=-1
trainer.test_freq=20
data.train_files=$HOME/data/gsm8k/train.parquet
data.val_files=$HOME/data/gsm8k/test.parquet
actor_rollout_ref.rollout.multi_turn.tool_config_path="$PROJECT_DIR/examples/sglang_multiturn/config/tool_config/gsm8k_tool_config.yaml"
trainer.total_epochs=2 $@ the following error occurred:

ray.exceptions.RayTaskError(IndexError): ray::TaskRunner.run() File "/verl/verl/trainer/main_ppo.py", line 343, in run trainer.fit() File "/verl/verl/trainer/ppo/ray_trainer.py", line 1039, in fit val_metrics = self._validate() File "/verl/verl/trainer/ppo/ray_trainer.py", line 587, in _validate test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded) File "/verl/verl/single_controller/ray/base.py", line 48, in call output = ray.get(output) ray.exceptions.RayTaskError(IndexError): ray::WorkerDict.actor_rollout_generate_sequences() File "/verl/verl/single_controller/ray/base.py", line 700, in func return getattr(self.worker_dict[key], name)(*args, **kwargs) File "/verl/verl/single_controller/base/decorator.py", line 442, in inner return func(*args, **kwargs) File "/verl/verl/utils/transferqueue_utils.py", line 199, in dummy_inner return func(*args, **kwargs) File "/verl/verl/utils/profiler/profile.py", line 256, in wrapper return func(self_instance, *args, **kwargs_inner) File "/verl/verl/workers/fsdp_workers.py", line 920, in generate_sequences loop.run_until_complete(self.rollout_mode()) File "/verl/verl/workers/fsdp_workers.py", line 716, in rollout_mode await self.rollout.update_weights(per_tensor_param, peft_config=peft_config, base_sync_done=self.base_sync_done) File "/verl/verl/workers/rollout/sglang_rollout/sglang_rollout.py", line 1525, in update_weights await sgl_update_weights( File "/verl/sglangorg/python/sglang/srt/weight_sync/utils.py", line 58, in update_weights MultiprocessingSerializer.serialize( File "/verl/sglangorg/python/sglang/srt/utils.py", line 1856, in serialize ForkingPickler(buf).dump(obj) File "/verl/sglangorg/python/sglang/srt/patch_torch.py", line 42, in _reduce_tensor_modified output_args = _modify_tuple( File "/verl/sglangorg/python/sglang/srt/patch_torch.py", line 71, in _modify_tuple return *t[:index], modifier(t[index]), *t[index + 1 :] IndexError: tuple index out of range This issue occurs when using verl with sglang and LoRA (actor_rollout_ref.model.lora_rank > 0). It seems to be related to tensor serialization in sglang/srt/patch_torch.py — specifically in the _modify_tuple function where an IndexError arises from accessing an invalid tuple index.

Expected behavior

Please add official support for using verl together with sglang under LoRA configuration. Currently, the weight synchronization mechanism in verl does not appear to handle LoRA adapters correctly when sglang is used as the rollout backend.

Could you please:

Confirm whether LoRA is supported with the sglang rollout in verl?

Suggest a workaround or fix?

Hi，Did you solved the bug？

Nov 13 '25 08:11 Kedaya66

Hi，Did you solved the bug？

No, not yet. But I found that the issue seems to be caused by LoRA weights being kept on the CPU — see this line. That appears to trigger the problem.

Nov 13 '25 09:11 CatWongCoi

Hi，Did you solved the bug？

No, not yet. But I found that the issue seems to be caused by LoRA weights being kept on the CPU — see this line. That appears to trigger the problem.

maybe param_offload=True triggers the problems?

Nov 13 '25 09:11 Kedaya66

Hi，Did you solved the bug？

No, not yet. But I found that the issue seems to be caused by LoRA weights being kept on the CPU — see this line. That appears to trigger the problem.

maybe param_offload=True triggers the problems?

I just tested it — even after setting actor_rollout_ref.ref.fsdp_config.param_offload=False in the original script, the same error still occurs.

Nov 13 '25 09:11 CatWongCoi

Hi，Did you solved the bug？

No, not yet. But I found that the issue seems to be caused by LoRA weights being kept on the CPU — see this line. That appears to trigger the problem.

maybe param_offload=True triggers the problems?

I just tested it — even after setting actor_rollout_ref.ref.fsdp_config.param_offload=False in the original script, the same error still occurs.

bad news

Nov 13 '25 09:11 Kedaya66

I have met the same problem

Nov 29 '25 13:11 williamIIliu