donut Different input resolution throws error

Following is the error we get when we try to pass an input size of 512*2,512*3: Are different input resolution/sizes are not supported currently? Traceback (most recent call last): File "train.py", line 149, in train(config) File "train.py", line 57, in train model_module = DonutModelPLModule(config) File "/home/souvic/Desktop/upwork1/donut/donut/lightning_module.py", line 35, in init ignore_mismatched_sizes=True, File "/home/souvic/Desktop/upwork1/donut/donut/donut/model.py", line 595, in from_pretrained model = super(DonutModel, cls).from_pretrained(pretrained_model_name_or_path, revision="official", *model_args, **kwargs) File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/transformers/modeling_utils.py", line 2113, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/home/souvic/Desktop/upwork1/donut/donut/donut/model.py", line 387, in init name_or_path=self.config.name_or_path, File "/home/souvic/Desktop/upwork1/donut/donut/donut/model.py", line 70, in init num_classes=0, File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 500, in init downsample=PatchMerging if (i < self.num_layers - 1) else None File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 408, in init for i in range(depth)]) File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 408, in for i in range(depth)]) File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 281, in init mask_windows = window_partition(img_mask, self.window_size) # num_win, window_size, window_size, 1 File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 111, in window_partition x = x.view(B, H // window_size, window_size, W // window_size, window_size, C) RuntimeError: shape '[1, 25, 10, 38, 10, 1]' is invalid for input of size 98304

Sep 11 '22 04:09 Souvic

Hi @Souvic, can you check if you changed input_size in config(e.g. https://github.com/clovaai/donut/blob/master/config/train_cord.yaml#L8) to proper value?

Sep 20 '22 09:09 long8v

Hi, this issue is related to the window_size of the image encoder (swin). For donut-base, set the size of each axis to a multiple of 320, e.g., [640, 640], [960, 640], [1280, 960], etc. Related comment: https://github.com/clovaai/donut/issues/22#issuecomment-1214643015 Hope this help :) Feel free to reopen this or open another issue if you have anything new for sharing.

Dec 16 '22 04:12 gwkrsrch

Hey @gwkrsrch I do not know much about Computer Vision. Please could you help me understand this? I am very sorry to bother you but I am very confused

Jun 18 '24 22:06 praneet0017

donut donut copied to clipboard

Different input resolution throws error

donut
donut copied to clipboard