Swin-Transformer
Swin-Transformer copied to clipboard
how mask and pos bias infuence the training precise?
Hi, Thank for your great job. I have a small question. I want to know how important is mask and position bias in out model as for final training precise? I find if I dont use mask and pos bias, the final precise can also near 84%, but training speed become so fast. So if we are not that sensetive to precise, ignoring mask and pos bias is a better choice? Would you please help me check the my judgement about those tradeoff.
ignore mask and pos bias, the code modification is shown as following
- ignore pos bias: delete here
- ignore mask: set mask=None in herehttps://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L273
Also, i'm little confused about using attn_mask
.
Using attn_mask
limit the attention operating on each individual Window, while it seems to me, the attention operate on adjacent pixel is reasonable.
After all, Shifted window introduces is meant to introduce cross-window connections, so we don't need to mask out pixels (tokens) from other Window.
If we delete attn_mask
, the Computational complexity remains the same. (linear with height and width)