dinov2 two questions about semantic segmentation with dinov2

two questions about semantic segmentation with dinov2

Open ydhongHIT opened this issue 1 year ago • 1 comments

Hi, thanks for your great work! I have two question. First, I observed that in your paper, dinov2 pre-trained at high resolution performs better as the resolution goes higher, for instance from 512 resolution to 640 resolution. Does it mean the model can adapt to different resolutions for semantic segmentation? If so, is there any insight behind this phenomenon? It is easy for CNNs but not trivial for ViTs, to my knowledge. Second, how did you resize the input to multiple of 14? I saw that you defined the CenterPadding class in notebooks. So you use the padding rather than resizing? Does it have any impact on performance?

Oct 14 '23 11:10 ydhongHIT

Resize_pos_embed function is not being used in here. What to do when pre-training resolution is inconsistent with downstream input resolution?

Oct 14 '23 12:10 ydhongHIT

dinov2 dinov2 copied to clipboard

two questions about semantic segmentation with dinov2

dinov2
dinov2 copied to clipboard