InternLM-XComposer icon indicating copy to clipboard operation
InternLM-XComposer copied to clipboard

Is XComposer2-4KHD capable of REC and detection?

Open zihui-debug opened this issue 8 months ago • 0 comments

Hi, I try to evaluate XComposer2-4KHD on RefCOCO for REC task refer to https://github.com/InternLM/InternLM-XComposer/issues/261. The result is quite poor. Does the coordinate in response need to be post-processed like other MLLMs (eg. for qwen2.5vl, the coordinates should be resized from the input resolution to actual resolution of image)? Moreover, I’m wondering whether XComposer2-4KHD supports detection tasks. If so, could you please provide guidance on how such evaluation should be performed?

Image

Image

Image

zihui-debug avatar May 15 '25 12:05 zihui-debug