dinov2
dinov2 copied to clipboard
model and weight are not compatible
I download the 'dinov2_vitl14_pretrain.pth' locally and use the 'vit_large' function to build model. I run the fellowing code. But the weight and the model are not compatible
from dinov2.models.vision_transformer import vit_large
model = vit_large(patch_size=14)
state_dict = torch.load('checkpoint/dinov2_vitl14_pretrain.pth', map_location='cpu') model.load_state_dict(state_dict, strict=False)
I got the error:
"size mismatch for pos_embed: copying a param with shape torch.Size([1, 1370, 1024]) from checkpoint, the shape in current model is torch.Size([1, 257, 1024])"
I have also encountered the same problem. Have you found the reason?
Hello,
You should add the following parameter when initialising the vit_large:
model = vit_large(patch_size=14, img_size=518)
This is because we published weights of models trained on images of size 518x518 (see Section 4 'Adapting the resolution' in the paper).
Hello, You should add the following parameter when initialising the vit_large:
model = vit_large(patch_size=14, img_size=518)
This is because we published weights of models trained on images of size 518x518 (see Section 4 'Adapting the resolution' in the paper). I understand. Thank you for your patience.
I have noticed that there are some differences between this approach and loading the model from the network. I haven't yet identified the reason for these differences
I have also encountered the same problem. Have you found the reason?
hi, you can calculate it in the following ways: $$ N_{patch} = size_{img} * size_{img} / (size_{patch} * size_{patch}) len(embedding_{pos}) = N_{patch} + 1 $$