dinov2 performance of vitl

I tried to train a model with the setting given in the vitl_short16.yaml file. The top-1 accuracy of k-nn is 72.3% and the accuracy of linear is 74.9%, which are about 8-9% lower than the results of 81.6% on k-NN eval and 82.9% on linear. I used 4 nodes of 32 gpus. Is there anything wrong with my training? What is the evaluation setting to produce the results?

Aug 24 '23 02:08 clzhou

Can you give me more detail about how you obtain these results ? Which checkpoint did you evaluate ? The command line in the readme evaluates the checkpoint at 25k iterations (in case you missed it).

Aug 24 '23 16:08 qasfb

@qasfb I trained the model with the command line below: MASTER_PORT=45621 NODE_RANK=$1 PYTHONPATH=. torchrun --rdzv-id=10000 --nnode=4 --nproc-per-node=8 --master-addr=$MASTER_ADDR --master-port=$MASTER_PORT --node-rank=$NODE_RANK \ dinov2/train/train.py \ --config-file dinov2/configs/train/vitl16_short.yaml \ --output-dir output \ train.dataset_path=ImageNet:split=TRAIN:root=$data_root:extra=$data_root

k-nn evaluation was done with the following command line: MASTER_ADDR='127.0.0.1' MASTER_PORT=45621 NODE_RANK=0 PYTHONPATH=. torchrun --rdzv-id=10000 --nnode=1 --nproc-per-node=8 --master-addr=$MASTER_ADDR --master-port=$MASTER_PORT --node-rank=$NODE_RANK \ dinov2/eval/knn.py \ --config-file $output_dir/config.yaml \ --pretrained-weights output/eval/training_24999/teacher_checkpoint.pth \ --output-dir output/eval/training_24999/knn \ --train-dataset ImageNet:split=TRAIN:root=$data_root:extra=$data_root \ --val-dataset ImageNet:split=VAL:root=$data_root:extra=$data_root

Aug 24 '23 16:08 clzhou

It seems you are evaluating the checkpoint at 25k iterations ?

Aug 24 '23 20:08 qasfb

@qasfb That is the problem. Thanks.

Aug 25 '23 01:08 clzhou

It might be worth clarifying in the README that the example commands for ImageNet eval corresponds to an intermediate checkpoint (24999)

Oct 11 '24 01:10 dfan

performance of vitl_short16 model