dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

How to use Dinov2 for self-supervised training?

Open zhanglaoban-kk opened this issue 1 year ago • 7 comments

Hello, I would like to ask, how to use Dinov2 for self-supervised learning of unlabeled images, get pre-trained weights, and then load the obtained pre-trained weights for supervised learning with labeled data (for image classification)?

zhanglaoban-kk avatar Oct 01 '23 09:10 zhanglaoban-kk

Hi team,

Thanks @zhanglaoban-kk brought this up, I have the same problem.

  • For using unlabeled images

Is that like what #142 @TheoMoutakanni mentioned that the label in training/eval/testing are all just for evaluation, and all we need just set the label as something like '0' or 'np.nan', then mocking the ImageNet dataset?

MODEL: WEIGHTS: 'https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth'

However, it reports error ... File "/hpcfs/users/a1781032/default_studio/dinov2/dinov2/train/train.py", line 153, in do_train start_iter = checkpointer.resume_or_load(cfg.MODEL.WEIGHTS, resume=resume).get("iteration", -1) + 1 ... checkpoint_state_dict = checkpoint.pop("model") KeyError: 'model'

which the dinov2_vitb14_pretrain.pth does not have key model but the architecture keys: 'cls_token', 'pos_embed', 'mask_token', 'patch_embed.proj.weight', 'patch_embed.proj.bias', 'blocks.0.norm1.weight', 'blocks.0.norm1.bias', 'blocks.0.attn.qkv.weight', 'blocks.0.attn.qkv.bias', 'blocks.0.attn.proj.weight', 'blocks.0.attn.proj.bias' ...

How can we do to load the provided pre-train checkpoint into the model before training?

Best, Steve

steve-zeyu-zhang avatar Oct 01 '23 15:10 steve-zeyu-zhang

Hi team,

I had some new findings. After I looked through #154 @surajyakoa explained that he/she wanted to fine-tune DINOv2 with some unlabeled data and @qasfb has explained that it requires the full checkpoint not just the backbone, and since they hadn't release them so it's not practical.

Is that correct?

Steve

steve-zeyu-zhang avatar Oct 02 '23 12:10 steve-zeyu-zhang

Hi team,

I had some new findings. After I looked through #154 @surajyakoa explained that he/she wanted to fine-tune DINOv2 with some unlabeled data and @qasfb has explained that it requires the full checkpoint not just the backbone, and since they hadn't release them so it's not practical.

Is that correct?

Steve

Sad moment, so it is not posible to fine-tune current model from hub?

NikitaRA avatar Oct 30 '23 10:10 NikitaRA

Any news here?

kusstox avatar Nov 17 '23 13:11 kusstox

Any news here?

Nope for now, someone that know how to make self-distillation using the train file in the repo ?

MarioAvolio avatar Nov 18 '23 16:11 MarioAvolio

Hi,

My advice would be to not load the weights using the checkpointer resume function, as it also expects the optimizer buffer etc. However you can launch a normal training from scratch, and load these pretrained weights before the first iteration with torch.load and model.load_state_dict. Make sure you both load the weights to the student and the teacher.

You may need to slightly rename the checkpoint keys: blocks.0.attn.qkv.weight may be named something like module.backbone.blocks.0.0.attn.qkv.weight or whatever. I don't remember the exact mapping, but it should not be hard to find if you compare the keys on both sides

You might also need to use strict=False, because the checkpoint does not include the head weights. However, make sure you print out the load message, so that you know if the loading failed or not. That is a common source of bugs.

The only real question is: how well will the training work with a pretrained backbone and a random init head. I think it may be beneficial to freeze the backbone and train only the heads for a few thousand iterations, so it has a bit of time to fit. Then unfreeze the whole and start the real training.

Best of luck,

Tim

TimDarcet avatar Nov 23 '23 16:11 TimDarcet

Hi team,

Thanks @zhanglaoban-kk brought this up, I have the same problem.

  • For using unlabeled images

Is that like what #142 @TheoMoutakanni mentioned that the label in training/eval/testing are all just for evaluation, and all we need just set the label as something like '0' or 'np.nan', then mocking the ImageNet dataset?

MODEL: WEIGHTS: 'https://dl.fbaipublicfiles.com/dinov2/dinov2_vitb14/dinov2_vitb14_pretrain.pth'

However, it reports error ... File "/hpcfs/users/a1781032/default_studio/dinov2/dinov2/train/train.py", line 153, in do_train start_iter = checkpointer.resume_or_load(cfg.MODEL.WEIGHTS, resume=resume).get("iteration", -1) + 1 ... checkpoint_state_dict = checkpoint.pop("model") KeyError: 'model'

which the dinov2_vitb14_pretrain.pth does not have key model but the architecture keys: 'cls_token', 'pos_embed', 'mask_token', 'patch_embed.proj.weight', 'patch_embed.proj.bias', 'blocks.0.norm1.weight', 'blocks.0.norm1.bias', 'blocks.0.attn.qkv.weight', 'blocks.0.attn.qkv.bias', 'blocks.0.attn.proj.weight', 'blocks.0.attn.proj.bias' ...

How can we do to load the provided pre-train checkpoint into the model before training?

Best, Steve

Hi Steve, I have the same require as yours. I also want to train dinov2 using unlabeled images. Does it work on [imagenet] mocked dataset ? Could you offer some detail to me?

Best, Luo

risingClouds avatar Feb 03 '24 03:02 risingClouds