DDQ
DDQ copied to clipboard
Some questions about the DDQ-DETR code
Hi, thanks for opensource such a wonderful work! However, I get confused when I read the DDQ-DETR code in file ddq_detr.py
.
1, as far as I can understand, the 2d matrix distinct_query_mask
whose both rows and columns corrosponding to the distinct queries will be set False. In line 46
and line 584
, you take the first row to get indices of distinct queries, but I don't think this operation works: if the first query is selected for distinct queies, the first line in distinct_query_mask
will be all False.
2, I see both the paper and code show that DQS is performed before each decoder layer. However, if 2 distinct queries unfortunately produce similar predictions, the classification loss will still be disturbed because you use predictions other than queries at the certain layer to match gt. I wonder have you tried to do DQS after each decoder layer?
Thank you for your interest. For question2, I guess your confusion can be solved with 583 l_id -1 , we do nms on the predictions of the last stage and cache it for this stage, which is different with it in one-stage model.
Regarding question 1, you are correct that the implementation may result in a logic error when the first query is selected as a distinct query. Keeping the real_keep_index
to cache may be a better approach.
However, this scenario may be relatively rare and may not have a significant impact on performance. I have tried another implementation where non-distinct queries are removed from the decoder directly, but it resulted in slightly lower performance than the current implementation.
Thank you for bringing this to our attention!
Regarding question 1, you are correct that the implementation may result in a logic error when the first query is selected as a distinct query. Keeping the
real_keep_index
to cache may be a better approach.However, this scenario may be relatively rare and may not have a significant impact on performance. I have tried another implementation where non-distinct queries are removed from the decoder directly, but it resulted in slightly lower performance than the current implementation.
Thank you for bringing this to our attention!
It's very kind of you to reply so quickly! I've seen that you do DQS before each decoder layer in line 583
. What I'm curious of is why you choose to do DQS on the predictions from last layer in DETR? Have you try to do DQS on the predictions of the current layer in DETR, just like what you did in FCN & R-CNN?
During camera-ready, we added results of DDQ-DETR that were not included in our submission. Therefore, I did not have enough time to try other implementations. Performing DQS on the predictions of the current layer in DETR is also reasonable. I performed a similar ablation study on Sparse R-CNN (each decoder layer) in a very initial version (perhaps before ECCV2022) and obtained comparable results for these two implementations. You can try it on DDQ-DETR
During camera-ready, we added results of DDQ-DETR that were not included in our submission. Therefore, I did not have enough time to try other implementations. Performing DQS on the predictions of the current layer in DETR is also reasonable. I performed a similar ablation study on Sparse R-CNN (each decoder layer) in a very initial version (perhaps before ECCV2022) and obtained comparable results for these two implementations. You can try it on DDQ-DETR
Got it! Thank you so much for such quick reply.