InternLM-XComposer icon indicating copy to clipboard operation
InternLM-XComposer copied to clipboard

The model's Grounding capability is still unsatisfactory.(模型的Grounding能力不尽人意)

Open WeiminLee opened this issue 1 year ago • 0 comments

The grounding ability of the fine-tuned model still falls short of meeting production requirements, showing a significant gap compared to the CogAgent model.

examples

68e2fb4e6c95e66c829f8992aa6fb5a1 {"query": "<ImageHere> In the photograph, could you pinpoint the location of "ACHADOS E PERDIDOS" and tell me its bounding boxes?", "label": "The bounding box is [475, 12, 578, 28]", "response": "The bounding box is [528, 69, 643, 119]"}

a44b4236091c5d3169ae89a3d4e815a2 {"query": "<ImageHere> In, can you guide me to the location of "THE BIG WAVES JOURNAL" by providing bounding boxes?", "label": "The bounding box is [522, 0, 628, 82]", "response": "The bounding box is [593, 88, 680, 123]"}

fc90916946817c44ca102f46343a3698

{"query": "<ImageHere> Help me to locate "Vinyl Fencing" in and give me its bounding boxes, please.", "label": "The bounding box is [329, 803, 375, 819]", "response": "The bounding box is [352, 839, 423, 857]"}

WeiminLee avatar Jun 06 '24 11:06 WeiminLee