InternLM-XComposer
InternLM-XComposer copied to clipboard
Is XComposer2-4KHD capable of REC and detection?
Hi, I try to evaluate XComposer2-4KHD on RefCOCO for REC task refer to https://github.com/InternLM/InternLM-XComposer/issues/261. The result is quite poor. Does the coordinate in response need to be post-processed like other MLLMs (eg. for qwen2.5vl, the coordinates should be resized from the input resolution to actual resolution of image)? Moreover, I’m wondering whether XComposer2-4KHD supports detection tasks. If so, could you please provide guidance on how such evaluation should be performed?