dinov2 Questions about feature maps/patch size

Questions about feature maps/patch size

Open smandava98 opened this issue 8 months ago • 1 comments

I'm still not entirely sure how ViTs learn various features as I've always intuitively understood convnets better. How can I get feature maps that include the color? For my use case, color is important. When using forward_features it seems to extract the feature map from the last layer but is it also possible to get some of the earlier layers which I presume look have color? Or do ViT backbone feature maps have color info and edges and stuff already incorporated in the output feature map?

Also, what is the recommended patch size for in the wild images/videos? Imagenet seems to be 224 with center cropping but in some of my own images, edges of the images are important. I think I could get away without any cropping but I am unsure about patch size and how differences will affect performance.

Oct 08 '23 08:10 smandava98

dinov2 dinov2 copied to clipboard

Questions about feature maps/patch size

dinov2
dinov2 copied to clipboard