Swin-Transformer
Swin-Transformer copied to clipboard
ape(absolute positional embedding) is set False by default, is it OK?
The parameter ape, which stands for adding absolute position embedding to the patch embedding, is set False by default in the config file.
To my knowledge, for transformer models, input should be with positional embeddings to identify where they are. So I wonder if setting ape=False is proper.
Does this mean the swin-transformer model built by default is not sensible to the position of each patch?
And, if true, would this influence the performance of finetuned model?
Hi @feiyangsuo I was also coming across the positional embedding usage, they also use relative positional bias: https://github.com/microsoft/Swin-Transformer/blob/eed077f68e0386e8cdff2e1981492699d9c190c0/models/swin_transformer.py#L89
Which is a learnable matrix of the size of a window, that gets added to the attention matrix. I think in this way they can exclude the absolute positional embedding. I general there exist also relative positional embedding schemes which are used in other vision transformers architectures in addition to relative positional bias.
@feiyangsuo Yes, ape is set False by default, where we find ape has no benefit to general vision recognition problem. If you want to use this feature, you can initialize the ape by zero vectors, such that you can directly leverage the pre-trained models.
Hi @feiyangsuo I was also coming across the positional embedding usage, they also use relative positional bias:
https://github.com/microsoft/Swin-Transformer/blob/eed077f68e0386e8cdff2e1981492699d9c190c0/models/swin_transformer.py#L89
Which is a learnable matrix of the size of a window, that gets added to the attention matrix. I think in this way they can exclude the absolute positional embedding. I general there exist also relative positional embedding schemes which are used in other vision transformers architectures in addition to relative positional bias.
I got it, thanks. So actually swin transformer has a relative position sensibility rather than absolute, which is somewhat alike convolution.