Swin-Transformer How we can set window image size on 112x112 image

Hi

Thank you for your great work. My Image size is 112x112 and the head is 12 and my window size is 7. It does not work for me.

Traceback

  File "/raid/khawar/PycharmProjects/thesis/vit_pytorch/SwinT/swin.py", line 111, in forward
    q, k, v = map(
  File "/raid/khawar/PycharmProjects/thesis/vit_pytorch/SwinT/swin.py", line 112, in <lambda>
    lambda t: rearrange(t, 'b (nw_h w_h) (nw_w w_w) (h d) -> b h (nw_h nw_w) (w_h w_w) d',
  File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/einops/einops.py", line 424, in rearrange
    return reduce(tensor, pattern, reduction='rearrange', **axes_lengths)
  File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/einops/einops.py", line 376, in reduce
    raise EinopsError(message + '\n {}'.format(e))
einops.EinopsError:  Error while processing rearrange-reduction pattern "b (nw_h w_h) (nw_w w_w) (h d) -> b h (nw_h nw_w) (w_h w_w) d".
 Input tensor shape: torch.Size([1, 3, 3, 384]). Additional info: {'h': 24, 'w_h': 7, 'w_w': 7}.
 Shape mismatch, can't divide axis of length 3 in chunks of 7

Regards, Khawar

Jul 11 '21 13:07 khawar-islam

I noticed that many people use 224224 pictures as datasets, 224 can be divisible by 7. I also encountered this problem when using 512512 pictures. In order to be divisible, I set all the default parameters of model/swin_transformer (window_size=7) modified to 8 :)

Aug 08 '21 11:08 Nial4

@Nial4 I am also facing the same for size 112 but still not working. Any advice?

Aug 08 '21 11:08 khawar-islam

Because from stage1 to stage3, the image size needs to be divisible by 4, and the windows size is required to be divisible by 7, so I suggest maybe you can try to resize the image to 128 * 128, and then change SwinTransforme init and WindowAttention init let window_size=8

Aug 08 '21 12:08 Nial4

Hi

Thank you for your great work. My Image size is 112x112 and the head is 12 and my window size is 7. It does not work for me.

Traceback

  File "/raid/khawar/PycharmProjects/thesis/vit_pytorch/SwinT/swin.py", line 111, in forward
    q, k, v = map(
  File "/raid/khawar/PycharmProjects/thesis/vit_pytorch/SwinT/swin.py", line 112, in <lambda>
    lambda t: rearrange(t, 'b (nw_h w_h) (nw_w w_w) (h d) -> b h (nw_h nw_w) (w_h w_w) d',
  File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/einops/einops.py", line 424, in rearrange
    return reduce(tensor, pattern, reduction='rearrange', **axes_lengths)
  File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/einops/einops.py", line 376, in reduce
    raise EinopsError(message + '\n {}'.format(e))
einops.EinopsError:  Error while processing rearrange-reduction pattern "b (nw_h w_h) (nw_w w_w) (h d) -> b h (nw_h nw_w) (w_h w_w) d".
 Input tensor shape: torch.Size([1, 3, 3, 384]). Additional info: {'h': 24, 'w_h': 7, 'w_w': 7}.
 Shape mismatch, can't divide axis of length 3 in chunks of 7

Regards, Khawar

You may change the window size of the last stage as 3x3 or 4x4 (the feature map size). Another solution is to use padding.

Aug 12 '21 08:08 ancientmooner

Hi @Nial4, Following your suggestion, I modify window_size to 8 when training model with 512, 512 image. I will encounter size mismatch with relative_position_index and head when loading pre-trained weight, will these change drop perfromance of Swin transformer? And how can I deal with these, I set strict=False, but It still has error when loading pre-trained weight.

Is there any documentation on fine-tuning the Swin transformer? I have no idea for it. Thanks for your reply in advance!

Aug 17 '21 16:08 scott870430

Hi @scott870430 The parameters in the pre-training model are fixed, you cannot modify it, you can only do some pre-processing on your data set, such as resize. Or modify the window size to retrain. I have recently used a lot of swin models in some models, and I don’t think this will significantly affect its performance.

Oct 18 '21 05:10 Nial4

HI @Nial4 , Thanks for your reply. Does your retraining include pre-trained weights? If I modify the window size to retrain, can I use the pre-trained weight and just need to remove the mismatch parameters?

Thanks!

Oct 18 '21 10:10 scott870430

Hi @Nial4 , I meet a data problem. Could you give me some advice. I use the swin_transformer as my backbone for segmentation. My train_size is 256, and window_size was set to 8, but when I train it, I get the error: RuntimeError: Expected 4-dimensional input for 4-dimensional weight [12, 192, 1, 1], but got 3-dimensional input of size [16, 1, 1] instead I have tried many times such as add a images=torch.unsqueeze(images, dim=0) but I failed. Thanks for your time. best

Oct 19 '21 07:10 FrankWuuu

Hi @Nial4 , Thanks for your suggestions! But I am not sure that is it OK to directly use the pre-train weight finetuning my model on different tasks with modified window-size? Thank!

Oct 19 '21 12:10 zhuole1025

Hi @Nial4 , I meet a data problem. Could you give me some advice. I use the swin_transformer as my backbone for segmentation. My train_size is 256, and window_size was set to 8, but when I train it, I get the error: RuntimeError: Expected 4-dimensional input for 4-dimensional weight [12, 192, 1, 1], but got 3-dimensional input of size [16, 1, 1] instead I have tried many times such as add a images=torch.unsqueeze(images, dim=0) but I failed. Thanks for your time. best

+1

Oct 25 '21 13:10 daixiangzi

Please go to Swin V2 for an approach to deal with varying window resolutions.

Dec 20 '21 09:12 ancientmooner

@scott870430 You can try bicubic interpolation to leverage the pretrained model weights with different window size

Dec 20 '21 10:12 ancientmooner

Swin-Transformer Swin-Transformer copied to clipboard

How we can set window image size on 112x112 image

Swin-Transformer
Swin-Transformer copied to clipboard