RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates

Open FerryHuang opened this issue 4 months ago • 4 comments

RuntimeError: Expected 1 prompt updates but found 0 prompt updates

Error

RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates! Either the prompt text has missing/incorrect tokens for multi-modal inputs, or there is a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_prompt_updates`).

Model

Qwen-2.5-VL-7B model

Data Format

All verified correct:

Every sample has exactly 1 image and 1 <image> tag
Image format: {"image": "/path/to/image.jpg"}
Prompt format: "<image>Math problem text..."

Training

Using DAPO training with VERL framework.

Data format is confirmed correct - seems like a model processor compatibility issue rather than data problem.

Aug 04 '25 08:08 FerryHuang

Have you solved this problem? I encountered the same issue, and I've verified the data multiple times. I'm using Swift for the GRPO training on my end.

MODEL：Qwen-2.5-VL-3B pip install transformers==4.51.3 accelerate pip install trl==0.17.0 pip install vllm==0.8.5

Aug 18 '25 03:08 lzcomeon

Have you includeddata.image_key=images in the training script

Sep 01 '25 05:09 Y-L-LIU

I think the prompt <image> can be modified to <im_start>, like this: vision_language.py

Sep 20 '25 03:09 cui36

does verl support DAPO with each sample contain ONLY ONE image? But the qwen 2.5 vl is able to perform sft with each sample contain multiple images

Nov 28 '25 05:11 disperaller