Swin-Transformer
Swin-Transformer copied to clipboard
Cannot Reproduce Swin Small Results ( Only achieves 82.9 top 1)
Hi @zeliu98
Thanks for this great work. I am trying to reproduce the results reported in the paper for Swin Small architecture using the exact same hyper-parameters as published in config files. Specifically, I am using 8 V100 GPUs ( which I suppose is same used in the paper) by running the following command line:
python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 main.py \
--cfg configs/swin_small_patch4_window7_224.yaml --data-path <imagenet-path> --batch-size 128
But the best top 1 accuracy I can get is 82.976 which falls short of the reported 83.2. Also I have attached the log for the best training run in case it can be useful.
log_rank0.txt
How can we achieve the accuracy of 83.2 ?
I really appreciate your response as it has taken me a lot of time to reproduce your results and cannot achieve it in anyways.
For the record, I am not the official authors of Swin-transformer. To my personal experience, most of deep learning models are hard to reproduce the same results as their paper reported. The reason is the random seed, which influence many aspects of a model. For example, parameter initialization. And it is somehow very difficult to reproduce same results even the random seed is fixed, as your framework (e.g., pytorch) may suffer from randomness where parallelization are employed. For more information about reproducibility, you may want to search the google. For your problem, my advice is fix the random seed to the same as the author used in their experiments.
In this setup, we are using the exact same seeds as set by the authors here. So why can't we achieve anything close to 83.20 as claimed in the paper ? we are also using same hardware setup.
On ImageNet, even small percentage matters. That's how a new SOTA is claimed. I hope @zeliu98 and other authors can look into this. I am strictly following their config and recommended seed so reproducible results should be possible.
Adding others for visibility @ancientmooner @caoyue10
Which subset did you choose to evaluate on? test or val set? I cannot find any ground truth of test set.