Mismatch Between Pre-trained Weights and Model Structure in SwinV2-Tiny Encoder: relative_coords_table/relative_position_index/attn_mask

Open sIHURs opened this issue 1 year ago • 0 comments

Hello everyone,

I want to use the SwinV2 model as an encoder in my project, specifically the tiny version （swinv2_tiny_patch4_window8_256） of the model. However, after comparing the pre-trained weights provided in the README with the model's structural output, I found that the following layers from the pre-trained weights are not present in the model:

layers.0.blocks.1.attn_mask: torch.Size([64, 64, 64])  
layers.0.blocks.0.attn.relative_coords_table: torch.Size([1, 15, 15, 2])  


layers.0.blocks.0.attn.relative_position_index: torch.Size([64, 64])

Could someone please help explain this? I would greatly appreciate it!

Yifan

Dec 01 '24 12:12 sIHURs