Qwen-VL icon indicating copy to clipboard operation
Qwen-VL copied to clipboard

[BUG] <title> 模型输出box坐标不是相对于原始输入图像的坐标

Open AlfaRomeo9527 opened this issue 1 year ago • 3 comments

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • [X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

demo_highfive 击掌(536,509),(588,602)

在示例中给出的结果,box坐标无法在原始图像中画出对应的击掌的位置。如下图 test

期望行为 | Expected Behavior

图像在输入模型前应该是进行了预处理,如resize操作,然后模型返回的结果是预处理后的坐标。 那么请问这个预处理是什么呢?如何通过输出的坐标计算回针对于原始输入图像的box坐标。

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

AlfaRomeo9527 avatar Feb 21 '24 08:02 AlfaRomeo9527

QWen的tokenizer会对坐标进行归一化,在tokenization_tokenization.py中可以看到

for box in boxes:
            if 'ref' in box: # random new color for new refexps
                color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()])
            x1, y1, x2, y2 = box['box']
            x1, y1, x2, y2 = (int(x1 / 1000 * w), int(y1 / 1000 * h), int(x2 / 1000 * w), int(y2 / 1000 * h))
            visualizer.draw_box((x1, y1, x2, y2), alpha=1, edge_color=color)

danjuan-77 avatar Feb 22 '24 05:02 danjuan-77

QWen的tokenizer会对坐标进行归一化,在tokenization_tokenization.py中可以看到

for box in boxes:
            if 'ref' in box: # random new color for new refexps
                color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()])
            x1, y1, x2, y2 = box['box']
            x1, y1, x2, y2 = (int(x1 / 1000 * w), int(y1 / 1000 * h), int(x2 / 1000 * w), int(y2 / 1000 * h))
            visualizer.draw_box((x1, y1, x2, y2), alpha=1, edge_color=color)

所以,QWen-VL的输出bbox需要先乘以1000再除以image_size这样的操作吗

yubo97 avatar Feb 28 '24 06:02 yubo97

QWen的tokenizer会对坐标进行归一化,在tokenization_tokenization.py中可以看到

for box in boxes:
            if 'ref' in box: # random new color for new refexps
                color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()])
            x1, y1, x2, y2 = box['box']
            x1, y1, x2, y2 = (int(x1 / 1000 * w), int(y1 / 1000 * h), int(x2 / 1000 * w), int(y2 / 1000 * h))
            visualizer.draw_box((x1, y1, x2, y2), alpha=1, edge_color=color)

所以,QWen-VL的输出bbox需要先乘以1000再除以image_size这样的操作吗

是的,我将这个操作之后的坐标在原图上能画出正确的框

danjuan-77 avatar Feb 29 '24 02:02 danjuan-77