InternVL
InternVL copied to clipboard
[Question] Inconsistency in prompting the model to output bounding box
Motivation
Hi.
I have tried to follow the template to prompt the model
<image>\nPlease provide the bounding box coordinate of the region this sentence describes: <ref>...</ref>
My first attempt:
<image>\nPlease provide the bounding box coordinate of the region this sentence describes: <ref>traffic light</ref>
The model is able to output following the format
<ref>class name</ref><box>[[x1, y1, x2, y2], ...]</box>
However, for my second attempt:
<image>\nPlease provide the bounding box coordinate of the region this sentence describes: <ref>cup</ref>
The model started to elaborate a lot on the steps and give the wrong top left and bottom right coordinates.
Any idea why and where should I start debugging this problem?
Appreciate any help provided!
Related resources
No response
Additional context
No response