dinov2 Confused on how to use pretrained backbone where you need full details of the image

Confused on how to use pretrained backbone where you need full details of the image

Open smandava98 opened this issue 5 months ago • 1 comments

Hi,

Thank you for the great work. I am conceptually confused on how to use this backbone. I read through your code on the depth estimation head but it is not yet clear to me on how to use the forward features effectively. Are these features projected somehow?

Where in the code do you turn this (batch size, num features, feature dim) map and flatten this/turn it into a vector?

I am just generally confused on how you process backbone feature maps. I am also confused on how these can be further used in image<>text sequence models where a vector representation of the image is needed.

Jan 14 '24 13:01 smandava98

dinov2 dinov2 copied to clipboard

Confused on how to use pretrained backbone where you need full details of the image

dinov2
dinov2 copied to clipboard