dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Question about resize the position embedding to 16x16

Open ambipomyan opened this issue 2 years ago • 4 comments
trafficstars

Hi all, the size of position embedding given by pre-trained ViTS/14 is 37x37, do you have any suggestions to resize it to 16x16? Thank you!

ambipomyan avatar Oct 17 '23 13:10 ambipomyan

The model does the resizing automatically during inference I think, no ?

qasfb avatar Oct 18 '23 09:10 qasfb

The model does the resizing automatically during inference I think, no ?

Hi, thank you for your reply! I am trying to create a ViTS/14 with input image size 224x224 and keep the batch size of 14, then the position embedding is of size 1x257x384, which is not matching the size of pre-trained ViTS/14 with 1x1370x384. I hope that I can make better use of the pre-trained weights and I just discard part of the pre-trained position embedding to 1x1285x384 then I applied torch.mean() to average it to 1x257x384 by factor of 5. However, I think in this way the weights are still somehow wasted, do you have any suggestions for reshaping the position embedding? Thank you!

ambipomyan avatar Oct 19 '23 15:10 ambipomyan

Hello! I'm having exactly the same problem... Any solutions?

davidanglada avatar Nov 17 '23 11:11 davidanglada

Does that mean that the pretrained ViTS/14 actually works with 37 patches of 6x6 from the images of size 224?

davidanglada avatar Nov 17 '23 11:11 davidanglada