Question about the sequence of decoder operations

Open tpwpfrnl opened this issue 1 year ago • 1 comments

Hi, I have a question regarding the sequence of decoder operations in your config file.

Based on your code, I guess the sequence of operations is as follows: self_attn -> cross_attn -> ffn -> Multiscaledeformableattention. However, when I read the paper, my understanding was that the sequence should be: MultiheadSelfAttention -> Multiscaledeformableattention -> ffn.

https://github.com/Sense-X/Co-DETR/blob/2d59a3038533d00732275a0f5d31cf5ff0b540ad/projects/configs/co_deformable_detr/co_deformable_detr_r50_1x_coco.py#L68C1-L89C56

Could you explain if I misunderstood the paper or code?

Nov 22 '24 12:11 tpwpfrnl

We follow the decoder design of Deformable-DETR. In Deformable-DETR, the correct decoder operation order is: self_attn -> cross_attn -> ffn. Specifically, self_attn is implemented by the MultiheadSelfAttention and cross_attn is the MultiscaleDeformableAttention.

Dec 13 '24 16:12 TempleX98