DAB-DETR Why modulating attention by w&h works?

Why modulating attention by w&h works?

Open SupetZYK opened this issue 2 years ago • 1 comments

I have some doubts on line https://github.com/IDEA-opensource/DAB-DETR/blob/main/models/DAB_DETR/transformer.py#L242 .

refHW_cond = self.ref_anchor_head(output).sigmoid() # nq, bs, 2

This line asks the model to learn absolute value of w, h from output. But NO supervision is applied. Besides, the 'output' tensor is used to learn the OFFSET of bbox (x, y, w, h).

So, I am wondering whether the model can learn width and height as expected?

Aug 30 '22 16:08 SupetZYK

The results show that our models get performance gains with the modulated operation.

Sep 02 '22 04:09 SlongLiu

DAB-DETR DAB-DETR copied to clipboard

Why modulating attention by w&h works?

DAB-DETR
DAB-DETR copied to clipboard