DiffusionDet [Fixed] Clone in DETR Is Fully Wrong!

Hi~ Thanks for your excellent work. But I think what you do in Fig. 4 for DETR is fully wrong.

In Fig. 4, you conduct dynamic experiments with DETR, and you use clone as the padding method.

Actually, cloning query embedding means cloning output bounding boxes. The drop of mAP is because all repeated bounding boxes are treated as false positive, since the ground-truth could only be hit once. In other words, the predicted bounding boxes are the same as before, just repeated some times!

In this case, I think you could not say that the performance of DETR is degenerated.

Also, for random pad, there may be also some bounding boxes that are nearly repeated as before, too. If you pad query embedding for DETR, it is fair to do MMS as post-processing method.

Attached: Clone does not change the results of nn.MultiheadAttention. Also, obviously, clone does not change the results of MLP. Therefore, clone does change the results of DETR.

q = torch.rand((100, 256))
k = torch.rand((16, 256))
v = k
attn = nn.MultiheadAttention(embed_dim=256, num_heads=8)
res1 = attn(q,k,v)
q2 = torch.cat([q] * 2, dim=0)
res2 = attn(q2,k,v)
print((res1[0] - res2[0][:100]).max()) # tensor(-5.3644e-07, grad_fn=<MinBackward1>)
print((res1[0] - res2[0][100:]).max()) # tensor(-5.9605e-07, grad_fn=<MinBackward1>)

Nov 24 '22 03:11 zhiyuanyou

Hi,

Thanks for your interest.

We agree with you that cloned queries will produce the same predictions, and NMS is needed for post-processing.

Anyway, in the best situation, DETR would keep the same AP and will not bring gains.

We have fixed this issue in our revised version, which will be released soon.

Regards, Shoufa

Nov 24 '22 03:11 ShoufaChen

Thanks for your reply. Also, thanks again for your excellent work, which provides a new aspect to understand object detection.

Nov 24 '22 03:11 zhiyuanyou

Hello everyone,

Thanks for your interest in our work.

We have fixed the DETR baseline with the dynamic box setting by adopting NMS when $N_{eval} > N_{train}$.

The updated results are shown in this figure:

detr_boxes

The detailed results are:

number of boxes	50	100	300	500	1000	2000	4000
AP	30.9762	34.8672	38.7824	38.3824	38.4298	38.4038	38.4097

Conclusion: DETR clone with NMS still has slight performance degradation when $N_{eval} > N_{train}$. On the contrary, our DiffusionDet has performance gains when using more evaluation boxes.

Dec 08 '22 07:12 ShoufaChen

DiffusionDet DiffusionDet copied to clipboard

[Fixed] Clone in DETR Is Fully Wrong!

DiffusionDet
DiffusionDet copied to clipboard