dinov2
dinov2 copied to clipboard
Question about resize the position embedding to 16x16
Hi all, the size of position embedding given by pre-trained ViTS/14 is 37x37, do you have any suggestions to resize it to 16x16? Thank you!
The model does the resizing automatically during inference I think, no ?
The model does the resizing automatically during inference I think, no ?
Hi, thank you for your reply! I am trying to create a ViTS/14 with input image size 224x224 and keep the batch size of 14, then the position embedding is of size 1x257x384, which is not matching the size of pre-trained ViTS/14 with 1x1370x384. I hope that I can make better use of the pre-trained weights and I just discard part of the pre-trained position embedding to 1x1285x384 then I applied torch.mean() to average it to 1x257x384 by factor of 5. However, I think in this way the weights are still somehow wasted, do you have any suggestions for reshaping the position embedding? Thank you!
Hello! I'm having exactly the same problem... Any solutions?
Does that mean that the pretrained ViTS/14 actually works with 37 patches of 6x6 from the images of size 224?