detrex
detrex copied to clipboard
`dino_eva_02_vitdet_*_1024_*` configs throw tensor shape mismatch error
Description
Tested all dino_eva_02_vitdet
models from here and the models with image_size=1024
seem to be failing.
Used this image from the installation tutorial.
Working:
-
projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_4attn_1280_lrd0p8_4scale_12ep.py
-
projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_8attn_1536_lrd0p8_4scale_12ep.py
-
projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_6attn_win32_1536_lrd0p7_4scale_12ep.py
Not working:
-
projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_l_4attn_1024_lrd0p8_4scale_12ep.py
-
projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py
Looking at the logs, the culprit seems to be this snippet of code: https://github.com/IDEA-Research/detrex/blob/03e02cb3182112569724092fc1c6935b61d54141/projects/dino_eva/modeling/dino.py#L529-L532
Log info example
Command
python demo/demo.py --config-file projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py \
--input idea.jpg \
--output visualized_results_eva_no_window_gpu.jpg \
--opts train.init_checkpoint="dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth"
Logs
[07/10 11:00:20 detectron2]: Arguments: Namespace(config_file='projects/dino_eva/configs/dino-eva-02/dino_eva_02_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.py', webcam=False, video_input=None, input=['idea.jpg'], output='visualized_results_eva_no_window_gpu.jpg', min_size_test=800, max_size_test=1333, img_format='RGB', metadata_dataset='coco_2017_val', confidence_threshold=0.5, opts=['train.init_checkpoint=dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth'])
======== shape of rope freq torch.Size([256, 64]) ========
======== shape of rope freq torch.Size([4096, 64]) ========
[07/10 11:00:24 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth ...
[07/10 11:00:24 fvcore.common.checkpoint]: [Checkpointer] Loading from dino_eva_02_in21k_pretrain_vitdet_b_4attn_1024_lrd0p7_4scale_12ep.pth ...
0% 0/1 [00:00<?, ?it/s]/content/detrex/./projects/dino_eva/modeling/dino.py:530: UserWarning: square_size=1024, is smaller than max_size=1199 in batch
warnings.warn("square_size={}, is smaller than max_size={} in batch".format(
0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/content/detrex/demo/demo.py", line 141, in <module>
predictions, visualized_output = demo.run_on_image(img, args.confidence_threshold)
File "/content/detrex/./demo/predictors.py", line 80, in run_on_image
predictions = self.predictor(image)
File "/content/detrex/./demo/predictors.py", line 207, in __call__
predictions = self.model([inputs])[0]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/detrex/./projects/dino_eva/modeling/dino.py", line 198, in forward
features = self.backbone(images.tensor) # output feature dict
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/detrex/./detrex/modeling/backbone/eva.py", line 583, in forward
bottom_up_features = self.net(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 431, in forward
x = blk(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 275, in forward
x = self.attn(x)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/detrex/./detrex/modeling/backbone/eva_02.py", line 117, in forward
q = self.rope(q).type_as(v)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/content/detrex/./detrex/modeling/backbone/eva_02_utils.py", line 349, in forward
return t * self.freqs_cos + rotate_half(t) * self.freqs_sin
RuntimeError: The size of tensor a (5476) must match the size of tensor b (4096) at non-singleton dimension 2