salad icon indicating copy to clipboard operation
salad copied to clipboard

confusion about feature shape in salad forward ?

Open chennuo0125-HIT opened this issue 8 months ago • 1 comments

dinov2's output should have N features, so i think feature shape should be [B, N, C, H // 14, W // 14] ? Image

chennuo0125-HIT avatar Apr 18 '25 08:04 chennuo0125-HIT

Hi @chennuo0125-HIT

The N features from DINOv2 are actually the H // 14 * W // 14. For every patch of the input image DINOv2 returns a C vector, so it returns (for every image) a [C, N] tensor, which is the same as [C, H // 14, W // 14]

serizba avatar Jul 18 '25 08:07 serizba