DiffusionDet
DiffusionDet copied to clipboard
[Fixed] Clone in DETR Is Fully Wrong!
Hi~ Thanks for your excellent work. But I think what you do in Fig. 4 for DETR is fully wrong.
In Fig. 4, you conduct dynamic experiments with DETR, and you use clone as the padding method.
Actually, cloning query embedding means cloning output bounding boxes. The drop of mAP is because all repeated bounding boxes are treated as false positive, since the ground-truth could only be hit once. In other words, the predicted bounding boxes are the same as before, just repeated some times!
In this case, I think you could not say that the performance of DETR is degenerated.
Also, for random pad, there may be also some bounding boxes that are nearly repeated as before, too. If you pad query embedding for DETR, it is fair to do MMS as post-processing method.
Attached: Clone does not change the results of nn.MultiheadAttention. Also, obviously, clone does not change the results of MLP. Therefore, clone does change the results of DETR.
q = torch.rand((100, 256))
k = torch.rand((16, 256))
v = k
attn = nn.MultiheadAttention(embed_dim=256, num_heads=8)
res1 = attn(q,k,v)
q2 = torch.cat([q] * 2, dim=0)
res2 = attn(q2,k,v)
print((res1[0] - res2[0][:100]).max()) # tensor(-5.3644e-07, grad_fn=<MinBackward1>)
print((res1[0] - res2[0][100:]).max()) # tensor(-5.9605e-07, grad_fn=<MinBackward1>)
Hi,
Thanks for your interest.
We agree with you that cloned queries will produce the same predictions, and NMS is needed for post-processing.
Anyway, in the best situation, DETR would keep the same AP and will not bring gains.
We have fixed this issue in our revised version, which will be released soon.
Regards, Shoufa
Thanks for your reply. Also, thanks again for your excellent work, which provides a new aspect to understand object detection.
Hello everyone,
Thanks for your interest in our work.
We have fixed the DETR baseline with the dynamic box setting by adopting NMS when $N_{eval} > N_{train}$.
The updated results are shown in this figure:
The detailed results are:
number of boxes | 50 | 100 | 300 | 500 | 1000 | 2000 | 4000 |
---|---|---|---|---|---|---|---|
AP | 30.9762 | 34.8672 | 38.7824 | 38.3824 | 38.4298 | 38.4038 | 38.4097 |
Conclusion: DETR clone with NMS still has slight performance degradation when $N_{eval} > N_{train}$. On the contrary, our DiffusionDet has performance gains when using more evaluation boxes.