dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Questions about feature maps/patch size

Open smandava98 opened this issue 8 months ago • 1 comments

I'm still not entirely sure how ViTs learn various features as I've always intuitively understood convnets better. How can I get feature maps that include the color? For my use case, color is important. When using forward_features it seems to extract the feature map from the last layer but is it also possible to get some of the earlier layers which I presume look have color? Or do ViT backbone feature maps have color info and edges and stuff already incorporated in the output feature map?

Also, what is the recommended patch size for in the wild images/videos? Imagenet seems to be 224 with center cropping but in some of my own images, edges of the images are important. I think I could get away without any cropping but I am unsure about patch size and how differences will affect performance.

smandava98 avatar Oct 08 '23 08:10 smandava98