transformer-in-transformer
transformer-in-transformer copied to clipboard
patch_tokens vs patch_pos_emb
Hi!
I'm trying to understand your TNT implementation and one thing that got me a bit confused is why there are 2 parameters patch_tokens
and patch_pos_emb
which seems to have the same purpose - to encode patch position. Isn't one of them redundant?
self.patch_tokens = nn.Parameter(torch.randn(num_patch_tokens + 1, patch_dim))
self.patch_pos_emb = nn.Parameter(torch.randn(num_patch_tokens + 1, patch_dim))
...
patches = repeat(self.patch_tokens[:(n + 1)], 'n d -> b n d', b = b)
patches += rearrange(self.patch_pos_emb[:(n + 1)], 'n d -> () n d')