dinov2 icon indicating copy to clipboard operation
dinov2 copied to clipboard

Clarifications on High-Resolution Adaptation

Open JihwanEom opened this issue 9 months ago • 4 comments

Hello,

I'm want to clarify regarding the high-resolution adaptation described in the paper. As per section 4 and Appendix B.2, it's mentioned that the model was trained at a higher resolution (from 224 to 518) over 10k iterations. However, I couldn't find the related codes in this repository.

  • Section 4 states:

    "Adapting the resolution (Touvron et al., 2019). Increasing image resolution is key to pixel-level downstream tasks such as segmentation or detection, where small objects disappear at low resolutions. However, training at high resolution is time and memory demanding, and instead, we increase the resolution of images to 518 × 518 during a short period at the end of pretraining."

  • Appendix B.2 mentions:

    "We initialise the model with the pretrained weights then train it for 10k iterations with the same procedure as the original pretraining. All the schedules are kept the same as in the original training, but compressed to fit in 10k iterations. All the hyperparameters are kept the same as in the first pretraining, except the base learning rate which is reduced."

  1. Code Availability: Are the high-resolution adaptation code lines not included in the repository's release?
  2. Details on "compressed to fit": Could you tell me about the details about "compressed to fit"? (probably it may be the answer for third question)
  3. Batch Size & Learning Rate: It would be very helpful if you could provide the specifics regarding the batch size and learning rate used during this high-resolution adaptation phase.

Thank you in advance!

JihwanEom avatar Sep 09 '23 15:09 JihwanEom