Swin-Transformer how mask and pos bias infuence the training precise?

how mask and pos bias infuence the training precise?

Open lw921014 opened this issue 2 years ago • 1 comments

Hi, Thank for your great job. I have a small question. I want to know how important is mask and position bias in out model as for final training precise? I find if I dont use mask and pos bias, the final precise can also near 84%, but training speed become so fast. So if we are not that sensetive to precise, ignoring mask and pos bias is a better choice? Would you please help me check the my judgement about those tradeoff.

ignore mask and pos bias, the code modification is shown as following

ignore pos bias: delete here
ignore mask: set mask=None in herehttps://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py#L273

Aug 01 '22 09:08 lw921014

Also, i'm little confused about using attn_mask.

Using attn_mask limit the attention operating on each individual Window, while it seems to me, the attention operate on adjacent pixel is reasonable. After all, Shifted window introduces is meant to introduce cross-window connections, so we don't need to mask out pixels （tokens） from other Window.

If we delete attn_mask, the Computational complexity remains the same. (linear with height and width)

Sep 01 '22 08:09 myexceptions

Swin-Transformer Swin-Transformer copied to clipboard

how mask and pos bias infuence the training precise?

Swin-Transformer
Swin-Transformer copied to clipboard