dinov2 Understanding the difference between CLS features vs PATCH features.

Understanding the difference between CLS features vs PATCH features.

Open barbolo opened this issue 1 year ago • 6 comments

Hi, first of all, thanks for the great work with DinoV2.

Imagine that I want to find dogs (and their positions) among several images in my dataset.

I have DINOv2 CLS features obtained from an image of a dog
I have several DINOv2 patch features for each image in my dataset.

I can confirm that I'm able to find images with dogs in my dataset by calculating a similarity score (e.g. dot product) between CLS feature of the dog image and patch features for each image in dataset. It did work.

What I'm trying to find out is if this result is just a coincidence or if it is intentional for DinoV2. I've skimmed through the paper and couldn't find the answer.

Thank you.

Sep 20 '23 20:09 barbolo

Coincidence I would say.

Sep 21 '23 11:09 qasfb

I've seen this kind of result with different classes of images (animals, objects, plants). Maybe this result has emerged unintentionally?

Sep 21 '23 12:09 barbolo

That's very much a possiblity; at no point we expect the CLS and patch tokens to align though !

Sep 21 '23 13:09 qasfb

Hi @barbolo

i'm trying to understand the difference between cls token and patch features. Can you please point me to some materials? I know that cls tokens are used as embedding for classification for example, but patch features I don't know what can be used for?

Thanks

Mar 08 '24 15:03 eric-vision-e

@eric-vision-e you might take a look at some demos (dense matching, sparse matching) in the link below:

https://dinov2.metademolab.com/

Mar 08 '24 15:03 barbolo

Hi @barbolo,

ok thanks. I understand now.

Mar 10 '24 18:03 eric-vision-e

dinov2 dinov2 copied to clipboard

Understanding the difference between CLS features vs PATCH features.

dinov2
dinov2 copied to clipboard