SAM-DETR icon indicating copy to clipboard operation
SAM-DETR copied to clipboard

The question about emb_dim in cross_attention module

Open Bo396543018 opened this issue 2 years ago • 3 comments

Hi, I found that compared to other DETR variants, the q and k dimensions in SAM cross-attention use SPx8 to be higher. I would like to ask if it is fairer to compare with SPx1.

Bo396543018 avatar Jul 09 '22 08:07 Bo396543018

Thanks for pointing this out.

In my experience, even if we add an additional Linear layer to reduce the feature dimension, SPx8 still outperforms SPx1. But that includes additional components, so we choose the design described in our paper and the code implementation, which also has superior performance.

Note that we include #Params and GFLOPs when compared with other DETR variants in our paper. Higher q and k dimensions bring both higher AP and higher #Params and GFLOPs.

ZhangGongjie avatar Jul 11 '22 01:07 ZhangGongjie

Thank you for your answer, there is another question I would like to ask, in SAM, why need to use two ROI operations to get q_content and q_content_point respectively.

Bo396543018 avatar Jul 11 '22 15:07 Bo396543018

I checked the codes. It turned out that they are redundant. One ROI operation is enough.

ZhangGongjie avatar Jul 21 '22 03:07 ZhangGongjie