DiffusionDet icon indicating copy to clipboard operation
DiffusionDet copied to clipboard

[Fixed] Clone in DETR Is Fully Wrong!

Open zhiyuanyou opened this issue 2 years ago • 3 comments

Hi~ Thanks for your excellent work. But I think what you do in Fig. 4 for DETR is fully wrong.

In Fig. 4, you conduct dynamic experiments with DETR, and you use clone as the padding method.

Actually, cloning query embedding means cloning output bounding boxes. The drop of mAP is because all repeated bounding boxes are treated as false positive, since the ground-truth could only be hit once. In other words, the predicted bounding boxes are the same as before, just repeated some times!

In this case, I think you could not say that the performance of DETR is degenerated.

Also, for random pad, there may be also some bounding boxes that are nearly repeated as before, too. If you pad query embedding for DETR, it is fair to do MMS as post-processing method.

Attached: Clone does not change the results of nn.MultiheadAttention. Also, obviously, clone does not change the results of MLP. Therefore, clone does change the results of DETR.

q = torch.rand((100, 256))
k = torch.rand((16, 256))
v = k
attn = nn.MultiheadAttention(embed_dim=256, num_heads=8)
res1 = attn(q,k,v)
q2 = torch.cat([q] * 2, dim=0)
res2 = attn(q2,k,v)
print((res1[0] - res2[0][:100]).max()) # tensor(-5.3644e-07, grad_fn=<MinBackward1>)
print((res1[0] - res2[0][100:]).max()) # tensor(-5.9605e-07, grad_fn=<MinBackward1>)

zhiyuanyou avatar Nov 24 '22 03:11 zhiyuanyou

Hi,

Thanks for your interest.

We agree with you that cloned queries will produce the same predictions, and NMS is needed for post-processing.

Anyway, in the best situation, DETR would keep the same AP and will not bring gains.

We have fixed this issue in our revised version, which will be released soon.

Regards, Shoufa

ShoufaChen avatar Nov 24 '22 03:11 ShoufaChen

Thanks for your reply. Also, thanks again for your excellent work, which provides a new aspect to understand object detection.

zhiyuanyou avatar Nov 24 '22 03:11 zhiyuanyou

Hello everyone,

Thanks for your interest in our work.

We have fixed the DETR baseline with the dynamic box setting by adopting NMS when $N_{eval} > N_{train}$.

The updated results are shown in this figure:

detr_boxes

The detailed results are:

number of boxes 50 100 300 500 1000 2000 4000
AP 30.9762 34.8672 38.7824 38.3824 38.4298 38.4038 38.4097

Conclusion: DETR clone with NMS still has slight performance degradation when $N_{eval} > N_{train}$. On the contrary, our DiffusionDet has performance gains when using more evaluation boxes.

ShoufaChen avatar Dec 08 '22 07:12 ShoufaChen