InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

[Question] Inconsistency in prompting the model to output bounding box

Open Jayden9912 opened this issue 1 year ago • 0 comments

Motivation

Hi.

I have tried to follow the template to prompt the model

<image>\nPlease provide the bounding box coordinate of the region this sentence describes: <ref>...</ref>

My first attempt:

<image>\nPlease provide the bounding box coordinate of the region this sentence describes: <ref>traffic light</ref>

The model is able to output following the format

<ref>class name</ref><box>[[x1, y1, x2, y2], ...]</box>

However, for my second attempt:

<image>\nPlease provide the bounding box coordinate of the region this sentence describes: <ref>cup</ref>

The model started to elaborate a lot on the steps and give the wrong top left and bottom right coordinates.

Any idea why and where should I start debugging this problem?

Appreciate any help provided!

Related resources

No response

Additional context

No response

Jayden9912 avatar Sep 25 '24 09:09 Jayden9912