Swin-Transformer icon indicating copy to clipboard operation
Swin-Transformer copied to clipboard

Unnecessary proj in WindowAttention?

Open askerlee opened this issue 2 years ago • 0 comments

We know that two linear transformations in a row can be merged into one linear transformation, if there's no activation function between them. In https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L141-L142

x = (attn @ v).transpose(1, 2).reshape(B_, N, C)
x = self.proj(x)

v is obtained by a linear layer (part of qkv). Then proj(att @ v) should be equivalent to att @ proj(v)? In theory, proj(v) can be combined into one linear transformation. Therefore seems it's not necessary to have this self.proj? Except that an extra dropout operation may somehow add feature robustness. Would there be any performance degradation if we remove self.proj? Thanks.

askerlee avatar Dec 08 '21 07:12 askerlee