dinov2
dinov2 copied to clipboard
Do feature vectors for DINOv2 include small objects?
Hi,
When I visualize the features via PCA I'm able to see small objects but I'm not sure if this means the 1024 feature vector or ViT-L from DINOv2 must include spatial information of small objects relative to larger objects in the image?
Also, how can I properly reason about when to use the patch tokens vs the final embedded vector that the model returns is I am trying to use it to build a video object detection model, which would predict accurate bounding boxes over frames?
Currently, I just use that final 1024 vector but not sure if I should use patch tokens as that would be a lot if I am operating on video.