verl raise value error when running qwen2.5vl 3b to 200+ steps

I found that there will be random shape errors after running 200 steps.

It happens in https://github.com/volcengine/verl/blob/main/verl/workers/actor/dp_actor.py#L127，

raise ValueError: Image features and image tokens do not match: tokens: 2601, features 2600.

This is my train scripts. When running 200 + steps, geo3k data has been traversed multiple times. I wonder why this situation occurs, and the number of steps that occur is not fixed yet. Thank you.

set -x
export MMRL_ACC=1
ENGINE=${1:-vllm}

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=/xxx/verl_data/train.parquet \
    data.val_files=/xxx/verl_data/test.parquet \
    data.train_batch_size=256 \
    data.max_prompt_length=1024 \
    data.max_response_length=2048 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    data.image_key=images \
    actor_rollout_ref.model.path=/xxx/Qwen2.5-VL-3B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=32 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.01 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=$ENGINE \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.enable_chunked_prefill=False \
    actor_rollout_ref.rollout.enforce_eager=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console'] \
    trainer.project_name='verl_grpo_example_geo3k' \
    trainer.experiment_name='qwen2_5_vl_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    actor_rollout_ref.actor.use_torch_compile=False \
    trainer.val_before_train=False \
    trainer.total_epochs=150 $@

May 22 '25 09:05 onehaitao

Sorry, I don't have an answer to your question, but I'm trying to run the same example script on Qwen 2.5 VL 3B, and I keep getting the following error:

TypeError: Qwen2_5_VLForConditionalGeneration.forward() got an unexpected keyword argument 'temperature'

I'm using the stable docker image whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3, and I also get this error for the 7B model.

May 24 '25 22:05 afland

Sorry, I don't have an answer to your question, but I'm trying to run the same example script on Qwen 2.5 VL 3B, and I keep getting the following error:
TypeError: Qwen2_5_VLForConditionalGeneration.forward() got an unexpected keyword argument 'temperature'
I'm using the stable docker image whatcanyousee/verl:ngc-cu124-vllm0.8.5-sglang0.4.6-mcore0.12.0-te2.3, and I also get this error for the 7B model.

Same Problem Here, But I found the source of this problem. Maybe the latest commit (https://github.com/volcengine/verl/commit/4779f2616428a525746bdfb65be447bcdca3012e) induces temperature into actor_module forward process, you can try to comment them here (L131 and L199 in verl/workers/actor/dp_actor.py)

May 26 '25 05:05 FangXinyu-0913

I had the same problem， image features and image tokens do not match：tokens：5261， features 5260

May 26 '25 11:05 zyx1213271098

I had the same problem， image features and image tokens do not match：tokens：5261， features 5260

Have you solved it now?

May 27 '25 02:05 onehaitao

I had the same problem， image features and image tokens do not match：tokens：5261， features 5260

Have you solved it now?

I haven't solved it yet, but I suspect that the problem is caused by data or tokenizer

May 27 '25 08:05 zyx1213271098

I found that the model predicted a<| imagepasswd |>token. You can decode inputs_ids and see if it is the same reason

May 29 '25 03:05 zyx1213271098

Hi, @zyx1213271098 , could you also share your code snippet in solving the issue?

May 29 '25 10:05 JierunChen

有解决方案么？

May 29 '25 11:05 NaivePawn

有解决方案么？

出现这个问题，大概率模型已经训崩了。我是增加try，跳过有问题的step

Jun 03 '25 08:06 zyx1213271098

Is this error only present in qwen2.5-vl-3B? I encountered the same issue, and even after updating to the latest code, I still get the same error And indeed, every time I encounter this problem, the training has already collapsed, and the model performance has been continuously declining during training. Could you please tell me if the reason for the performance decline is a problem with the verl code or my own configuration

Jul 17 '25 03:07 HelloWorld506

Is this error only present in qwen2.5-vl-3B? I encountered the same issue, and even after updating to the latest code, I still get the same error And indeed, every time I encounter this problem, the training has already collapsed, and the model performance has been continuously declining during training. Could you please tell me if the reason for the performance decline is a problem with the verl code or my own configuration

Same issues in Qwen2.5VL 32B.

Jul 25 '25 06:07 onehaitao

同样的问题我训练的Qwen2.5VL 7B 多模态数据。大家有发现什么新的线索吗？

Aug 09 '25 04:08 xwjahahahaha

推理的时候遇到了同样的问题 qwen2.5 vl 7b

Aug 11 '25 14:08 xingbo-jiang

Here’s something similar:

For me, the issue arises from truncation in rl_dataset. I use ‘left’ truncation, and since the token is placed at the start of each sentence in my dataset, it can get truncated off. As a result, the image_placeholder may be removed, leading to fewer image tokens than features. However, in your situation, it’s the opposite: the number of tokens exceeds the number of features, which is unusual.

But, for your case, the number of tokens > number of features, it's quite strange.

Aug 13 '25 09:08 HanshuYAN

有解决方案么？

出现这个问题，大概率模型已经训崩了。我是增加try，跳过有问题的step

可以问一下是怎么增加try吗在ray_trainer.py中吗

Aug 19 '25 10:08 Chenzhou2344

@Chenzhou2344 , @zyx1213271098 Can you solve the promblem?, I need your help.

Sep 10 '25 02:09 Juvenilecris

@onehaitao Can you solve the promblem?, I need your help.

Sep 10 '25 05:09 Juvenilecris

@HanshuYAN, Can you solve the promblem?, I need your help.

Sep 10 '25 05:09 Juvenilecris

@HanshuYAN, Can you solve the promblem?, I need your help.你能解决这个问题吗？我需要你的帮助。

I have same problem. File "/home/wangnn/anaconda3/envs/verl/lib/python3.10/site-packages/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py", line 1250, in forward raise ValueError( ValueError: Image features and image tokens do not match: tokens: 10271, features 10605

Sep 10 '25 05:09 Juvenilecris

@HanshuYAN ，I Have same problem ,help me

Sep 14 '25 03:09 Juvenilecris

有解决方案么？

出现这个问题，大概率模型已经训崩了。我是增加try，跳过有问题的step

可以问一下是怎么增加try吗在ray_trainer.py中吗

I have used the try-block to surround all stuff from the beginning of the dataloader's step to the end. And if raising the exception, I will not increase the global step.

Sep 29 '25 07:09 YuhaoCheng

I encountered this problem, and found my side of issue was kind of stupid: I forgot to put in some of my prompts that have images passed in. Not sure if this is helpful, but it won't hurt to check this.

Oct 21 '25 01:10 ziqipang