vggt Inquiry about DINOv2 Weights and Frozen DINO Experiment for VGGT Training

Thank you for your excellent work on VGGT and for making the code and models available! I'm currently studying the paper and implementation, and I have a couple of questions regarding the DINOv2 component that I'd be grateful if you could clarify.

Specific DINOv2 Pretrained Weights Used:

The paper mentions using DINOv2 for image tokenization (Appendix B) and that it provided better performance (Section 5, Patchifying). Could you please specify which DINOv2 pretrained weights were used as the starting point for VGGT? For instance, from the common DINOv2 checkpoints, was it one of these or another variant?

vit_models = {
    "dinov2_vitl14_reg": vit_large,
    "dinov2_vitb14_reg": vit_base,
    "dinov2_vits14_reg": vit_small,
    "dinov2_vitg14_reg": vit_giant, // Corrected from vitg2_reg, assuming standard DINOv2 names
}

Additionally, was the DINOv2 encoder initialized from these official DINOv2 pretrained weights and then fine-tuned jointly with VGGT, or was it trained from scratch as part of the VGGT training process?

Results of Experiment with Frozen DINO Parameters:

In a previous discussion/response

Hi David,

The DINO parameters were not frozen during our training. I am trying if freezing them could also work

Originally posted by @jytime in #66

, it was mentioned: "The DINO parameters were not frozen during our training. I am trying if freezing them could also work." I was wondering if you have any updates or results from this experiment? Specifically, how did the performance of VGGT compare when the DINOv2 encoder weights were kept frozen during training versus being jointly trained? Understanding these details would be very helpful for reproducing results and for further research building upon VGGT.

Thank you for your time and insights!

May 15 '25 04:05 YijingGuo-June

Hi,

We used dinov2_vitb14_reg, initialized it from the official DINOv2 pretrained weights and then fine-tuned jointly with VGGT. As far as I have observed, it can also work but with a performance drop.

May 16 '25 06:05 jytime

Hi, I used the pretrained VGGT and it seems DINO is dinov2_vitl14_reg, can I confirm with the author is this the correct version or dinov2_vitb14_reg?

Jun 09 '25 14:06 gordonhu608

@jytime What will happen if I simply replace "dinov2_vitl14_reg" with "dinov2_vits14_reg" and try to load it with from_pretrained? Also, could you please provide the checkpoint for "dinov2_vits14_reg"?

Aug 04 '25 13:08 GuoPingPan