performance of vitl_short16 model
I tried to train a model with the setting given in the vitl_short16.yaml file. The top-1 accuracy of k-nn is 72.3% and the accuracy of linear is 74.9%, which are about 8-9% lower than the results of 81.6% on k-NN eval and 82.9% on linear. I used 4 nodes of 32 gpus. Is there anything wrong with my training? What is the evaluation setting to produce the results?
Can you give me more detail about how you obtain these results ? Which checkpoint did you evaluate ? The command line in the readme evaluates the checkpoint at 25k iterations (in case you missed it).
@qasfb I trained the model with the command line below:
MASTER_PORT=45621 NODE_RANK=$1 PYTHONPATH=. torchrun --rdzv-id=10000 --nnode=4 --nproc-per-node=8 --master-addr=$MASTER_ADDR --master-port=$MASTER_PORT --node-rank=$NODE_RANK \ dinov2/train/train.py \ --config-file dinov2/configs/train/vitl16_short.yaml \ --output-dir output \ train.dataset_path=ImageNet:split=TRAIN:root=$data_root:extra=$data_root
k-nn evaluation was done with the following command line:
MASTER_ADDR='127.0.0.1' MASTER_PORT=45621 NODE_RANK=0 PYTHONPATH=. torchrun --rdzv-id=10000 --nnode=1 --nproc-per-node=8 --master-addr=$MASTER_ADDR --master-port=$MASTER_PORT --node-rank=$NODE_RANK \ dinov2/eval/knn.py \ --config-file $output_dir/config.yaml \ --pretrained-weights output/eval/training_24999/teacher_checkpoint.pth \ --output-dir output/eval/training_24999/knn \ --train-dataset ImageNet:split=TRAIN:root=$data_root:extra=$data_root \ --val-dataset ImageNet:split=VAL:root=$data_root:extra=$data_root
It seems you are evaluating the checkpoint at 25k iterations ?
@qasfb That is the problem. Thanks.
It might be worth clarifying in the README that the example commands for ImageNet eval corresponds to an intermediate checkpoint (24999)