RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates
RuntimeError: Expected 1 prompt updates but found 0 prompt updates
Error
RuntimeError: Expected there to be 1 prompt updates corresponding to 1 image items, but instead found 0 prompt updates! Either the prompt text has missing/incorrect tokens for multi-modal inputs, or there is a problem with your implementation of merged multi-modal processor for this model (usually arising from an inconsistency between `_call_hf_processor` and `_get_prompt_updates`).
Model
- Qwen-2.5-VL-7B model
Data Format
All verified correct:
- Every sample has exactly 1 image and 1
<image>tag - Image format:
{"image": "/path/to/image.jpg"} - Prompt format:
"<image>Math problem text..."
Training
Using DAPO training with VERL framework.
Data format is confirmed correct - seems like a model processor compatibility issue rather than data problem.
Have you solved this problem? I encountered the same issue, and I've verified the data multiple times. I'm using Swift for the GRPO training on my end.
MODEL:Qwen-2.5-VL-3B pip install transformers==4.51.3 accelerate pip install trl==0.17.0 pip install vllm==0.8.5
Have you includeddata.image_key=images in the training script
I think the prompt <image> can be modified to <im_start>, like this: vision_language.py
does verl support DAPO with each sample contain ONLY ONE image? But the qwen 2.5 vl is able to perform sft with each sample contain multiple images