InternVL [Bug] InternVL3做目标检测，坐标漂移

InternVL3做目标检测，返回的坐标不是原图的坐标，偏移很大, 14B 8B都是一样的结果

图片

prompt: "你是一个高级视觉分析模型，请严格按照步骤执行：

检测图像中所有人物，生成每个人的边界框坐标（格式：x1,y1,x2,y2，基于像素值）。
输出严格的JSON格式，包含以下字段： { "persons": [ { "bbox": [x1, y1, x2, y2] }, ... ] } "

输出 { "persons": [ { "bbox": [155, 180, 440, 700] } ] }

Apr 25 '25 08:04 deepblacksky

Documents 有说，grounding的output是相对坐标。

def normalize_coordinates(box, image_width, image_height):
    x1, y1, x2, y2 = box
    normalized_box = [
        round((x1 / image_width) * 1000),
        round((y1 / image_height) * 1000),
        round((x2 / image_width) * 1000),
        round((y2 / image_height) * 1000)
    ]
    return normalized_box

Apr 26 '25 09:04 zliucz

@zliucz After normalization, the bboxes are still very off, I wonder if this is a model issue

Jun 29 '25 23:06 lilyzhng

我这边测试一样，grounding 结果一塌糊涂，官方也没有任何用lmdeploy api 推理的，能够复现grounding的代码

Jul 30 '25 03:07 ZanePoe

我看起來像是給你模型輸入的bbox位置要不嘗試看看請他輸出normalize後的結果或是把你的圖片reaize看看 import cv2

image_path = "437349580-75fdd40f-0cdc-46ac-81eb-74a24d22d873.png"

img = cv2.imread(image_path) img = cv2.resize(img,(1024,1024)) cv2.rectangle(img, (155, 180), (440, 700), (0, 0, 255), 2) cv2.imshow("img",img) cv2.waitKey()

Aug 07 '25 05:08 Xx46883339

same problem

Aug 11 '25 03:08 Lwen1243

cogVLM grounds far better than InternVL does. https://huggingface.co/zai-org/cogvlm-grounding-generalist-hf

Sep 18 '25 16:09 hwang136