Following is the error we get when we try to pass an input size of 512*2,512*3:
Are different input resolution/sizes are not supported currently?
Traceback (most recent call last):
File "train.py", line 149, in
train(config)
File "train.py", line 57, in train
model_module = DonutModelPLModule(config)
File "/home/souvic/Desktop/upwork1/donut/donut/lightning_module.py", line 35, in init
ignore_mismatched_sizes=True,
File "/home/souvic/Desktop/upwork1/donut/donut/donut/model.py", line 595, in from_pretrained
model = super(DonutModel, cls).from_pretrained(pretrained_model_name_or_path, revision="official", *model_args, **kwargs)
File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/transformers/modeling_utils.py", line 2113, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/souvic/Desktop/upwork1/donut/donut/donut/model.py", line 387, in init
name_or_path=self.config.name_or_path,
File "/home/souvic/Desktop/upwork1/donut/donut/donut/model.py", line 70, in init
num_classes=0,
File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 500, in init
downsample=PatchMerging if (i < self.num_layers - 1) else None
File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 408, in init
for i in range(depth)])
File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 408, in
for i in range(depth)])
File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 281, in init
mask_windows = window_partition(img_mask, self.window_size) # num_win, window_size, window_size, 1
File "/home/souvic/anaconda3/envs/donut_official/lib/python3.7/site-packages/timm/models/swin_transformer.py", line 111, in window_partition
x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
RuntimeError: shape '[1, 25, 10, 38, 10, 1]' is invalid for input of size 98304
Hi @Souvic, can you check if you changed input_size
in config(e.g. https://github.com/clovaai/donut/blob/master/config/train_cord.yaml#L8) to proper value?
Hi, this issue is related to the window_size
of the image encoder (swin).
For donut-base
, set the size of each axis to a multiple of 320, e.g., [640, 640], [960, 640], [1280, 960], etc.
Related comment: https://github.com/clovaai/donut/issues/22#issuecomment-1214643015
Hope this help :) Feel free to reopen this or open another issue if you have anything new for sharing.
Hey @gwkrsrch
I do not know much about Computer Vision. Please could you help me understand this? I am very sorry to bother you but I am very confused