dinov2 about model architecture design

about model architecture design

Open JunzheJosephZhu opened this issue 6 months ago • 9 comments

two questions:

I noticed in vitl14.yaml, you set:

dino:
  head_n_prototypes: 131072
  head_bottleneck_dim: 384

If I understand correctly, this is just a linear layer. Whats the reasoning behind the extreme ratio of input/output channels? Is it operating under the assumption that prototypes are trying to approximate a one-hot distribution, so it's 131072 bits vs 384 floating point numbers?

Do you have plans on releasing the prototype heads? I'm trying to adapt the model to food domain by continuing from your released checkpoints and further doing SSL on recipe1M+ which has 14M images. Ideally I could directly resume from the teacher/student model. If I understand correctly, currently released weights are only for teacher backbones, right?

Dec 22 '23 11:12 JunzheJosephZhu

dinov2 dinov2 copied to clipboard

about model architecture design

dinov2
dinov2 copied to clipboard