dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

model and weight are not compatible

Open develop-productivity opened this issue 1 year ago • 5 comments

I download the 'dinov2_vitl14_pretrain.pth' locally and use the 'vit_large' function to build model. I run the fellowing code. But the weight and the model are not compatible

from dinov2.models.vision_transformer import vit_large
model = vit_large(patch_size=14)
state_dict = torch.load('checkpoint/dinov2_vitl14_pretrain.pth', map_location='cpu') model.load_state_dict(state_dict, strict=False) 

I got the error:

"size mismatch for pos_embed: copying a param with shape torch.Size([1, 1370, 1024]) from checkpoint, the shape in current model is torch.Size([1, 257, 1024])"

develop-productivity avatar Sep 16 '23 07:09 develop-productivity

I have also encountered the same problem. Have you found the reason?

AriesChen-UPC avatar Sep 18 '23 07:09 AriesChen-UPC

Hello, You should add the following parameter when initialising the vit_large: model = vit_large(patch_size=14, img_size=518) This is because we published weights of models trained on images of size 518x518 (see Section 4 'Adapting the resolution' in the paper).

TheoMoutakanni avatar Sep 18 '23 08:09 TheoMoutakanni

Hello, You should add the following parameter when initialising the vit_large: model = vit_large(patch_size=14, img_size=518) This is because we published weights of models trained on images of size 518x518 (see Section 4 'Adapting the resolution' in the paper). I understand. Thank you for your patience.

develop-productivity avatar Sep 18 '23 08:09 develop-productivity

I have noticed that there are some differences between this approach and loading the model from the network. I haven't yet identified the reason for these differences

AriesChen-UPC avatar Sep 21 '23 01:09 AriesChen-UPC

I have also encountered the same problem. Have you found the reason?

hi, you can calculate it in the following ways: $$ N_{patch} = size_{img} * size_{img} / (size_{patch} * size_{patch}) len(embedding_{pos}) = N_{patch} + 1 $$

develop-productivity avatar Sep 21 '23 02:09 develop-productivity