verl icon indicating copy to clipboard operation
verl copied to clipboard

RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates

Open FerryHuang opened this issue 4 months ago • 4 comments

RuntimeError: Expected 1 prompt updates but found 0 prompt updates

Error

RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates! Either the prompt text has missing/incorrect tokens for multi-modal inputs, or there is a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_prompt_updates`).

Model

  • Qwen-2.5-VL-7B model

Data Format

All verified correct:

  • Every sample has exactly 1 image and 1 <image> tag
  • Image format: {"image": "/path/to/image.jpg"}
  • Prompt format: "<image>Math problem text..."

Training

Using DAPO training with VERL framework.

Data format is confirmed correct - seems like a model processor compatibility issue rather than data problem.

FerryHuang avatar Aug 04 '25 08:08 FerryHuang

Have you solved this problem? I encountered the same issue, and I've verified the data multiple times. I'm using Swift for the GRPO training on my end.

MODEL:Qwen-2.5-VL-3B pip install transformers==4.51.3 accelerate pip install trl==0.17.0 pip install vllm==0.8.5

lzcomeon avatar Aug 18 '25 03:08 lzcomeon

Have you includeddata.image_key=images in the training script

Y-L-LIU avatar Sep 01 '25 05:09 Y-L-LIU

I think the prompt <image> can be modified to <im_start>, like this: vision_language.py

cui36 avatar Sep 20 '25 03:09 cui36

does verl support DAPO with each sample contain ONLY ONE image? But the qwen 2.5 vl is able to perform sft with each sample contain multiple images

disperaller avatar Nov 28 '25 05:11 disperaller