dinov2 Some Questions about model distillation

Thanks for your work! I have some questions about model distillation. "we leverage the same training loop with a few exceptions: we use a larger model as a frozen teacher, keep a spare EMA of the student that we use as our final model, remove the masking and stochastic depth, and, apply the iBOT loss on the two global crops." In Paper.

I can only get vit-g backbone pretrained model. "frozen teacher" means whether include "dino head" and "ibot head"?
what does "keep a spare EMA of the student" means? student model parameters are update with ema? student and teacher are not the same model.

Jun 02 '23 09:06 MMY1994

If you look at the default training config for ViT-G, a separate head is used for iBOT (two heads: DINO head and iBOT head). The Frozen teacher should include both of these frozen heads since this is distillation and you want the joint embedding to not change.
Keep a spare EMA of the student is essentially creating a copy of the student and updating it by exponential moving average at a certain frequency. This can be updated in the same way that the teacher is updated in the training code dinov2/train/ssl_meta_arch.py.

Note: The distillation code was not included in this repository. You cannot use ssl_meta_arch.py to do distillation as is. You would need to modify it to include the student EMA, and to load different models for teacher and student. You would also need to create a method to update the EMA similar to the way the frozen teacher is updated (this deals with collecting all the gradients in FSDP). I have created a fork of this repository with some distillation code which can be found here

Jun 07 '23 04:06 usryokousha

If you look at the default training config for ViT-G, a separate head is used for iBOT (two heads: DINO head and iBOT head). The Frozen teacher should include both of these frozen heads since this is distillation and you want the joint embedding to not change.

Keep a spare EMA of the student is essentially creating a copy of the student and updating it by exponential moving average at a certain frequency. This can be updated in the same way that the teacher is updated in the training code dinov2/train/ssl_meta_arch.py.

Note: The distillation code was not included in this repository. You cannot use ssl_meta_arch.py to do distillation as is. You would need to modify it to include the student EMA, and to load different models for teacher and student. You would also need to create a method to update the EMA similar to the way the frozen teacher is updated (this deals with collecting all the gradients in FSDP). I have created a fork of this repository with some distillation code which can be found here

Thanks for your great work! Did u reproduce their distillation result?

Jun 14 '23 06:06 leng-yue

@usryokousha Thanks for your great work! Have you reproduced their distillation result?

Jun 21 '23 05:06 nemonameless

@usryokousha your code has a little error when using copy.deepcopy on a ModuleDict with PyTorch models.

(<class 'RuntimeError'>, RuntimeError('Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment. If you were attempting to deepcopy a module, this may be because of a torch.nn.utils.weight_norm usage, see https://github.com/pytorch/pytorch/pull/103001'), <traceback object at 0x7f11a4044480>)

self.student = nn.ModuleDict(student_model_dict) self.teacher = nn.ModuleDict(teacher_model_dict) self.student_shadow = copy.deepcopy(self.student) # This line causes the error

How we can fix?

Nov 12 '23 17:11 MarioAvolio

As for the bug: RuntimeError('Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment.

I refer this link: https://github.com/pytorch/pytorch/issues/102981

I change the dino_head.py: from torch.nn.utils import weight_norm -> from torch.nn.utils.parametrizations import weight_norm

This requires the torch version >=2.1.0. Besides, I comment the line self.last_layer.weight_g.data.fill_(1)

Apr 18 '24 05:04 ChenweiLyu

dinov2 dinov2 copied to clipboard

Some Questions about model distillation

dinov2
dinov2 copied to clipboard