dinov2
dinov2 copied to clipboard
How to use DINOv2 pretrained ViT model for Downstream Task ?
Thanks for your great work and it impresses me ! I wanna to have a try in my research. Specificly, I wanna to use ViT-small to replace the ImageNet pretrained backbone for monocular 3D object detection task. The parameters of. these two networks are comparable and I thought the performance of DINOv2 pretrained ViT-small would be higher. However, the result shows that the performance of DINOv2 pretrained ViT-small is 20% lower, and the loss is hard to converge. Since I have fine-tuned learning rate , whatelse can I do to make the ViT backbone avaible ?