llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Qwen3-VL co-ordinate and bounding box errors (grounding errors)

Open sujitvasanth opened this issue 1 month ago • 4 comments

Hi Qwen3-VL bounding boxes and co-ordinates appear to be incorrect in both 4B (no co-ordinates at all) and 8B (poor localisation). This occurs even in the FP16 versions of these models so not quabtisation related.

Image

I can see theat when the convert_hf_to_gguf.py is run the non vison layers of the vision tower are removedr - im not sure if this is the cause of the problem.

this does not occur in huggingface transformers even for the same base model quantised to 4 bits

Image

The rtoblem is not isolated to python-api it occurs also in llama-mtmd-cli.exe

Image

see also here.. https://github.com/JamePeng/llama-cpp-python/issues/20

sujitvasanth avatar Nov 09 '25 23:11 sujitvasanth

Coordinates in qwen3vl are relative to a 1000x1000 grid. You need to rescale them back to the original image size.

ayayakirara avatar Nov 10 '25 01:11 ayayakirara

See #16880

ayayakirara avatar Nov 10 '25 01:11 ayayakirara

@ayayakirara Im already scaling the output coordinates from the 1000x1000 coordinate frame, I already have the hugging face transformers model performing perfectly. Please look carefully at the coordinate results. The problem is the results are in aproximately the right magnitude but poor accuracy in 8V(or no coordinates at all for 4B).

i did follow the breadcrumbs you gave and found https://github.com/ggml-org/llama.cpp/pull/16878 https://github.com/ggml-org/llama.cpp/issues/13694

so looks like the problem is the clip.cpp implementation which has aready been fixed in some branches i can see thyey have used the same model I have and are getting correct coordinates in particula clip resizing to min only seems to be one issue which would cause the inaccuracies.

sujitvasanth avatar Nov 10 '25 09:11 sujitvasanth

Do you have STR and the exact commit / version of llama.cpp you are using? The instructions for how to get the llama.cpp version/commit is in the Eval Bug issue template

ayayakirara avatar Nov 14 '25 03:11 ayayakirara