Qwen-VL [BUG] <title> 模型输出box坐标不是相对于原始输入图像的坐标

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

demo_highfive 击掌(536,509),(588,602)

在示例中给出的结果，box坐标无法在原始图像中画出对应的击掌的位置。如下图 test

期望行为 | Expected Behavior

图像在输入模型前应该是进行了预处理，如resize操作，然后模型返回的结果是预处理后的坐标。那么请问这个预处理是什么呢？如何通过输出的坐标计算回针对于原始输入图像的box坐标。

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

Feb 21 '24 08:02 AlfaRomeo9527

QWen的tokenizer会对坐标进行归一化，在tokenization_tokenization.py中可以看到

for box in boxes:
            if 'ref' in box: # random new color for new refexps
                color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()])
            x1, y1, x2, y2 = box['box']
            x1, y1, x2, y2 = (int(x1 / 1000 * w), int(y1 / 1000 * h), int(x2 / 1000 * w), int(y2 / 1000 * h))
            visualizer.draw_box((x1, y1, x2, y2), alpha=1, edge_color=color)

Feb 22 '24 05:02 danjuan-77

QWen的tokenizer会对坐标进行归一化，在tokenization_tokenization.py中可以看到

for box in boxes:
            if 'ref' in box: # random new color for new refexps
                color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()])
            x1, y1, x2, y2 = box['box']
            x1, y1, x2, y2 = (int(x1 / 1000 * w), int(y1 / 1000 * h), int(x2 / 1000 * w), int(y2 / 1000 * h))
            visualizer.draw_box((x1, y1, x2, y2), alpha=1, edge_color=color)

所以，QWen-VL的输出bbox需要先乘以1000再除以image_size这样的操作吗

Feb 28 '24 06:02 yubo97

QWen的tokenizer会对坐标进行归一化，在tokenization_tokenization.py中可以看到
for box in boxes:
            if 'ref' in box: # random new color for new refexps
                color = random.choice([_ for _ in mcolors.TABLEAU_COLORS.keys()])
            x1, y1, x2, y2 = box['box']
            x1, y1, x2, y2 = (int(x1 / 1000 * w), int(y1 / 1000 * h), int(x2 / 1000 * w), int(y2 / 1000 * h))
            visualizer.draw_box((x1, y1, x2, y2), alpha=1, edge_color=color)
所以，QWen-VL的输出bbox需要先乘以1000再除以image_size这样的操作吗

是的，我将这个操作之后的坐标在原图上能画出正确的框

Feb 29 '24 02:02 danjuan-77

Qwen-VL Qwen-VL copied to clipboard

[BUG] <title> 模型输出box坐标不是相对于原始输入图像的坐标

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

运行环境 | Environment

备注 | Anything else?

Qwen-VL
Qwen-VL copied to clipboard