dinov2
dinov2 copied to clipboard
Semantic segmentation
I'm not able to find code for Semantic segmentation. In the paper it's written that:
a linear layer is trained to predict class logits from a patch tokens. It is used to produce a low-
resolution logit map (eg 32x32 for a model with patch size 16), which is then upsampled to full resolution
(512x512) to obtain a segmentation map.
Does this mean a Linear layer with 32*32 = 1024 output classes need to be trained? What about n_last_blocks_list = [1, 4]
and n_last_blocks = max(n_last_blocks_list)
? Does that need to be changed to n_last_blocks_list = [1, 1]
and n_last_blocks = max(n_last_blocks_list)
?
Is there any sample code for semantic segmentation ?