GroupViT icon indicating copy to clipboard operation
GroupViT copied to clipboard

The meaning of pre_assign_attn

Open dingjiansw101 opened this issue 2 years ago • 4 comments

Thank you for sharing such a nice work. I have some questions. (1) What is the meaning of self.pre_assign_attn? Has it been described in the paper? https://github.com/NVlabs/GroupViT/blob/b4ef51b8ae997f4741811025ac2290df3423a27a/models/group_vit.py#L284

(2) Does self.assign represent equation (3), (4), and part of (5) in the paper?

dingjiansw101 avatar Apr 03 '22 22:04 dingjiansw101

Hi @dingjiansw101 (1) pre_assign_attn is used to aggregate information from image tokens to group tokens. It is omitted in the paper for simplicity. (2) Yes.

xvjiarui avatar Apr 05 '22 01:04 xvjiarui

Thanks for the reply. But I still have a little question about (1). From my understanding, the Transformer Layers before the Grouping Block already have the information propagations. Why do we need the extra pre_assign_attn for aggregation? What are the differences between them?

dingjiansw101 avatar Apr 05 '22 10:04 dingjiansw101

Hi @dingjiansw101 Yes. You are correct. The only difference is the cross attention and self attention. pre_assign_attn is there for some legacy reasons. We didn't ablate whether it is import for our model but I guess no.

xvjiarui avatar Apr 05 '22 21:04 xvjiarui

半夜找到相同的疑问的人 有点激动

l784669877 avatar Nov 09 '23 16:11 l784669877