dinov2 ConvNext

ConvNext

Open SimJeg opened this issue 2 years ago • 1 comments

Hello,

Have you considered using the ConvNext architecture for training DINOv2?

ConvNext has shown to have improved performance and lower latency on tasks such as CLIP. For example, in the open_clip repository, ConvNext-L@320 achieves better results with a +1.4% increase in zero-shot accuracy and is more than twice as fast as ViT-L@336.

While ViT may be easier to use in a "tokenize everything for my transformer world" approach, it's worth considering that CNNs still deserve... attention ^^

Best, Simon

Apr 18 '23 09:04 SimJeg

No, we have not (yet), but we might consider alternative architectures for the distilled models.

Apr 24 '23 23:04 patricklabatut

Closing to keep track of similar asks in #166 instead

Aug 23 '23 21:08 patricklabatut

dinov2 dinov2 copied to clipboard

ConvNext

dinov2
dinov2 copied to clipboard