GroupViT
GroupViT copied to clipboard
The meaning of pre_assign_attn
Thank you for sharing such a nice work. I have some questions. (1) What is the meaning of self.pre_assign_attn? Has it been described in the paper? https://github.com/NVlabs/GroupViT/blob/b4ef51b8ae997f4741811025ac2290df3423a27a/models/group_vit.py#L284
(2) Does self.assign represent equation (3), (4), and part of (5) in the paper?
Hi @dingjiansw101
(1) pre_assign_attn
is used to aggregate information from image tokens to group tokens. It is omitted in the paper for simplicity.
(2) Yes.
Thanks for the reply. But I still have a little question about (1). From my understanding, the Transformer Layers before the Grouping Block already have the information propagations. Why do we need the extra pre_assign_attn for aggregation? What are the differences between them?
Hi @dingjiansw101
Yes. You are correct. The only difference is the cross attention and self attention. pre_assign_attn
is there for some legacy reasons. We didn't ablate whether it is import for our model but I guess no.
半夜找到相同的疑问的人 有点激动