Issues about Positional Embedding and Reference Point

Open tae-mo opened this issue 3 years ago • 1 comments

Hi, thanks for sharing your wonderful work.

I got a question in here, https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/transformer.py#L33 which embedes positional information in the query_pos.

however, I don't understand the reason why does 2*(dim_t//2) has to be devided by 128, instead of the actual dimension pos_tensor has (e.g., 256 by default). https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/transformer.py#L38 Is it works correctly even dim_t is divided by 128?

I would appreciate to be corrected !

And another question is, when we do the calculation of the equation (1) in the paper, https://github.com/Atten4Vis/ConditionalDETR/blob/ead865cbcf88be10175b79165df0836c5fcfc7e3/models/conditional_detr.py#L89 can I understand that the model would learn "offsets" from the corresponding reference points? what is precise role of the reference points?

Thank you!

Oct 22 '22 12:10 tae-mo

Hi, for question (1), why does 2*(dim_t//2) has to be devided by 128, since the position embedding performs on both the x and y direction, then concat.

Sep 14 '23 03:09 Run542968