dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Do feature vectors for DINOv2 include small objects?

Open smandava98 opened this issue 3 months ago • 5 comments

Hi,

When I visualize the features via PCA I'm able to see small objects but I'm not sure if this means the 1024 feature vector or ViT-L from DINOv2 must include spatial information of small objects relative to larger objects in the image?

Also, how can I properly reason about when to use the patch tokens vs the final embedded vector that the model returns is I am trying to use it to build a video object detection model, which would predict accurate bounding boxes over frames?

Currently, I just use that final 1024 vector but not sure if I should use patch tokens as that would be a lot if I am operating on video.

smandava98 avatar Mar 23 '24 04:03 smandava98