dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

two questions about semantic segmentation with dinov2

Open ydhongHIT opened this issue 1 year ago • 1 comments

Hi, thanks for your great work! I have two question. First, I observed that in your paper, dinov2 pre-trained at high resolution performs better as the resolution goes higher, for instance from 512 resolution to 640 resolution. Does it mean the model can adapt to different resolutions for semantic segmentation? If so, is there any insight behind this phenomenon? It is easy for CNNs but not trivial for ViTs, to my knowledge. Second, how did you resize the input to multiple of 14? I saw that you defined the CenterPadding class in notebooks. So you use the padding rather than resizing? Does it have any impact on performance?

ydhongHIT avatar Oct 14 '23 11:10 ydhongHIT

Resize_pos_embed function is not being used in here. What to do when pre-training resolution is inconsistent with downstream input resolution?

ydhongHIT avatar Oct 14 '23 12:10 ydhongHIT