The model's Grounding capability is still unsatisfactory.(模型的Grounding能力不尽人意)
The grounding ability of the fine-tuned model still falls short of meeting production requirements, showing a significant gap compared to the CogAgent model.
examples
{"query": "<ImageHere> In the photograph, could you pinpoint the location of "ACHADOS E PERDIDOS" and tell me its bounding boxes?", "label": "The bounding box is [475, 12, 578, 28]", "response": "The bounding box is [528, 69, 643, 119]"}
{"query": "<ImageHere> In, can you guide me to the location of "THE BIG WAVES JOURNAL" by providing bounding boxes?", "label": "The bounding box is [522, 0, 628, 82]", "response": "The bounding box is [593, 88, 680, 123]"}
{"query": "<ImageHere> Help me to locate "Vinyl Fencing" in and give me its bounding boxes, please.", "label": "The bounding box is [329, 803, 375, 819]", "response": "The bounding box is [352, 839, 423, 857]"}