dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

ConvNext

Open SimJeg opened this issue 2 years ago • 1 comments

Hello,

Have you considered using the ConvNext architecture for training DINOv2?

ConvNext has shown to have improved performance and lower latency on tasks such as CLIP. For example, in the open_clip repository, ConvNext-L@320 achieves better results with a +1.4% increase in zero-shot accuracy and is more than twice as fast as ViT-L@336.

While ViT may be easier to use in a "tokenize everything for my transformer world" approach, it's worth considering that CNNs still deserve... attention ^^

Best, Simon

SimJeg avatar Apr 18 '23 09:04 SimJeg

No, we have not (yet), but we might consider alternative architectures for the distilled models.

patricklabatut avatar Apr 24 '23 23:04 patricklabatut

Closing to keep track of similar asks in #166 instead

patricklabatut avatar Aug 23 '23 21:08 patricklabatut