dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Whether different archs are supported for student and teacher?

Open AgentEXPL opened this issue 2 years ago • 4 comments

When building models for teacher and student in this code, the parameter args.arch is used for both student and teacher.

As written in the paper, smaller models are distilled from the largest model (a frozen teacher). How to achieve this through the above code?

AgentEXPL avatar Aug 25 '23 10:08 AgentEXPL

The code does not allow distillation at the moment, you would need to hack a little bit inside, such that you:

  • initialize and freeze a teacher along with its heads (which we don't provide)
  • train a student with this teacher
  • apply the masked-image-modeling loss to all the tokens (not only the subset that is masked)
  • track an EMA of the student for evaluation

qasfb avatar Sep 01 '23 12:09 qasfb

Thanks for the reply ! I am also interested in this. So no plans to add this missing code ?

ykaganov avatar Sep 06 '23 09:09 ykaganov

Thanks! @qasfb I'm wondering do we need the Dino loss except the masked-image-modeling loss during the distillation?

memoiry avatar Sep 08 '23 16:09 memoiry

Hi @qasfb

* apply the masked-image-modeling loss to all the tokens (not only the subset that is masked)

Could you explain why the MiM loss needs to be applies to all the tokens in this case?

amundra15 avatar Jul 16 '24 11:07 amundra15