SimMIM
SimMIM copied to clipboard
can not reproduce your results. trained from your released pre-trained vit-base model
Hello, I tired to reproduce your vit-B results on imagenet-1k. I run the scripts following your readme.md, except that I used one NVIDIA-A100 GPUs *8 instead two nodes.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node 8 main_finetune.py
--cfg /data/users/zhangjunlei/code/simmim/SimMIM-main/configs/vit_base__800ep/simmim_finetune__vit_base__img224__800ep.yaml
--batch-size 128
--data-path /data/users/zhangjunlei/dataset/IMT
--pretrained /data/users/models/simmim_pretrain__vit_base__img224__800ep.pth
--output /data/users/zhangjunlei/output/mim
--tag finetuneIMN_downloadedPretrainedVitbbaseline
--accumulation-steps 2
I run the code without modifying your code. But the max acc is 83.6. Your paper is 83.8. The log is listed following:
[2022-04-01 01:12:41 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.066 (3.066) Loss 0.3681 (0.3681) Acc@1 93.359 (93.359) Acc@5 98.730 (98.730) Mem 39691MB [2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.596 Acc@5 96.636 [2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6% [2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.62% [2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.2106559070000715e-06, 1.2106559070000715e-06, 1.3305729095853683e-06, 1.3305729095853683e-06, 1.5150606058704401e-06, 1.5150606058704401e-06, 1.798887830924397e-06, 1.798887830924397e-06, 2.235545100238177e-06, 2.235545100238177e-06, 2.9073255145670688e-06, 2.9073255145670688e-06, 3.940833844303826e-06, 3.940833844303826e-06, 5.53084665928345e-06, 5.53084665928345e-06, 7.977020220790567e-06, 7.977020220790567e-06, 1.1740364161570747e-05, 1.1740364161570747e-05, 1.7530124070463328e-05, 1.7530124070463328e-05, 2.6437447007221146e-05, 2.6437447007221146e-05, 4.0141020756079324e-05, 4.0141020756079324e-05, 6.122344190816884e-05, 6.122344190816884e-05] [2022-04-01 01:12:57 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][0/1251] eta 1:10:34 lr 0.000061 time 3.3847 (3.3847) loss 1.5133 (1.5133) grad_norm 2.3348 (2.3348) mem 39691MB [2022-04-01 01:14:07 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][100/1251] eta 0:13:56 lr 0.000060 time 0.6853 (0.7264) loss 1.4034 (1.3413) grad_norm 2.6896 (2.7750) mem 39691MB [2022-04-01 01:15:17 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][200/1251] eta 0:12:29 lr 0.000059 time 0.6943 (0.7133) loss 1.4613 (1.3375) grad_norm 2.3938 (2.7978) mem 39691MB [2022-04-01 01:16:27 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][300/1251] eta 0:11:14 lr 0.000057 time 0.6894 (0.7089) loss 0.7287 (1.3413) grad_norm 2.0634 (2.7904) mem 39691MB [2022-04-01 01:17:37 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][400/1251] eta 0:10:01 lr 0.000056 time 0.6978 (0.7069) loss 1.0116 (1.3347) grad_norm 2.0660 (nan) mem 39691MB [2022-04-01 01:18:47 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][500/1251] eta 0:08:49 lr 0.000055 time 0.6884 (0.7053) loss 1.4866 (1.3390) grad_norm 2.5877 (nan) mem 39691MB [2022-04-01 01:19:56 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][600/1251] eta 0:07:38 lr 0.000053 time 0.6879 (0.7044) loss 1.3395 (1.3345) grad_norm 2.0532 (nan) mem 39691MB [2022-04-01 01:21:07 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][700/1251] eta 0:06:27 lr 0.000052 time 0.6993 (0.7038) loss 1.5304 (1.3324) grad_norm 2.6916 (nan) mem 39691MB [2022-04-01 01:22:16 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][800/1251] eta 0:05:17 lr 0.000051 time 0.6891 (0.7033) loss 1.0127 (1.3341) grad_norm 1.9580 (nan) mem 39691MB [2022-04-01 01:23:26 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][900/1251] eta 0:04:06 lr 0.000050 time 0.6901 (0.7028) loss 0.8164 (1.3327) grad_norm 1.9068 (nan) mem 39691MB [2022-04-01 01:24:36 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1000/1251] eta 0:02:56 lr 0.000048 time 0.6892 (0.7024) loss 1.5583 (1.3314) grad_norm 2.2595 (nan) mem 39691MB [2022-04-01 01:25:46 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1100/1251] eta 0:01:46 lr 0.000047 time 0.6845 (0.7022) loss 1.3442 (1.3315) grad_norm 2.4191 (nan) mem 39691MB [2022-04-01 01:26:56 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1200/1251] eta 0:00:35 lr 0.000046 time 0.7005 (0.7019) loss 1.2828 (1.3303) grad_norm 2.3000 (nan) mem 39691MB [2022-04-01 01:27:31 simmim_finetune] (main_finetune.py 230): INFO EPOCH 93 training takes 0:14:38 [2022-04-01 01:27:34 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.048 (3.048) Loss 0.3621 (0.3621) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB [2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.550 Acc@5 96.636 [2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.5% [2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.62% [2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.1549451215637642e-06, 1.1549451215637642e-06, 1.2431484613501714e-06, 1.2431484613501714e-06, 1.378845907175413e-06, 1.378845907175413e-06, 1.5876112084450153e-06, 1.5876112084450153e-06, 1.9087885950136343e-06, 1.9087885950136343e-06, 2.402907651273048e-06, 2.402907651273048e-06, 3.1630908147490695e-06, 3.1630908147490695e-06, 4.332603373942948e-06, 4.332603373942948e-06, 6.131853465010453e-06, 6.131853465010453e-06, 8.899930528191233e-06, 8.899930528191233e-06, 1.3158510625392428e-05, 1.3158510625392428e-05, 1.971017231339427e-05, 1.971017231339427e-05, 2.9789651833397102e-05, 2.9789651833397102e-05, 4.5296543402632226e-05, 4.5296543402632226e-05] [2022-04-01 01:27:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][0/1251] eta 1:12:29 lr 0.000045 time 3.4766 (3.4766) loss 1.2928 (1.2928) grad_norm 2.0603 (2.0603) mem 39691MB [2022-04-01 01:29:00 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][100/1251] eta 0:13:58 lr 0.000044 time 0.6995 (0.7282) loss 1.5299 (1.3223) grad_norm 2.9111 (2.7888) mem 39691MB [2022-04-01 01:30:10 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][200/1251] eta 0:12:31 lr 0.000043 time 0.6885 (0.7147) loss 0.9353 (1.3269) grad_norm 2.2225 (2.7882) mem 39691MB [2022-04-01 01:31:20 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][300/1251] eta 0:11:14 lr 0.000042 time 0.6891 (0.7095) loss 1.2098 (1.3210) grad_norm 2.1065 (2.7789) mem 39691MB [2022-04-01 01:32:30 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][400/1251] eta 0:10:01 lr 0.000041 time 0.6902 (0.7068) loss 1.3892 (1.3320) grad_norm 2.4860 (2.7760) mem 39691MB [2022-04-01 01:33:40 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][500/1251] eta 0:08:50 lr 0.000040 time 0.6902 (0.7058) loss 1.4630 (1.3225) grad_norm 2.4686 (2.7736) mem 39691MB [2022-04-01 01:34:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][600/1251] eta 0:07:38 lr 0.000039 time 0.6897 (0.7047) loss 1.0282 (1.3204) grad_norm 2.3859 (2.7731) mem 39691MB [2022-04-01 01:36:00 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][700/1251] eta 0:06:27 lr 0.000037 time 0.6890 (0.7039) loss 1.2109 (1.3232) grad_norm 2.2731 (2.7694) mem 39691MB [2022-04-01 01:37:10 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][800/1251] eta 0:05:17 lr 0.000036 time 0.6896 (0.7033) loss 1.3349 (1.3265) grad_norm 2.1328 (2.7723) mem 39691MB [2022-04-01 01:38:20 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][900/1251] eta 0:04:06 lr 0.000035 time 0.7249 (0.7029) loss 1.5155 (1.3238) grad_norm 2.3807 (2.7779) mem 39691MB [2022-04-01 01:39:30 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1000/1251] eta 0:02:56 lr 0.000034 time 0.6991 (0.7025) loss 1.5035 (1.3234) grad_norm 3.5433 (2.7713) mem 39691MB [2022-04-01 01:40:40 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1100/1251] eta 0:01:46 lr 0.000033 time 0.6895 (0.7022) loss 1.3362 (1.3223) grad_norm 2.2755 (2.7748) mem 39691MB [2022-04-01 01:41:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1200/1251] eta 0:00:35 lr 0.000032 time 0.6905 (0.7020) loss 1.1312 (1.3200) grad_norm 2.2855 (2.7730) mem 39691MB [2022-04-01 01:42:25 simmim_finetune] (main_finetune.py 230): INFO EPOCH 94 training takes 0:14:38 [2022-04-01 01:42:28 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.078 (3.078) Loss 0.3602 (0.3602) Acc@1 93.457 (93.457) Acc@5 98.730 (98.730) Mem 39691MB [2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.656 Acc@5 96.634 [2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.7% [2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66% [2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.107709723978354e-06, 1.107709723978354e-06, 1.1690240608640958e-06, 1.1690240608640958e-06, 1.2633538099190834e-06, 1.2633538099190834e-06, 1.4084765007729102e-06, 1.4084765007729102e-06, 1.6317421790095668e-06, 1.6317421790095668e-06, 1.9752278378351925e-06, 1.9752278378351925e-06, 2.5036673129515393e-06, 2.5036673129515393e-06, 3.316651120822842e-06, 3.316651120822842e-06, 4.567395440624847e-06, 4.567395440624847e-06, 6.491617471089471e-06, 6.491617471089471e-06, 9.45195905641966e-06, 9.45195905641966e-06, 1.4006330726158412e-05, 1.4006330726158412e-05, 2.1013056371910337e-05, 2.1013056371910337e-05, 3.1792634288451755e-05, 3.1792634288451755e-05] [2022-04-01 01:42:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][0/1251] eta 1:14:20 lr 0.000032 time 3.5654 (3.5654) loss 1.4688 (1.4688) grad_norm 2.3501 (2.3501) mem 39691MB [2022-04-01 01:43:54 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][100/1251] eta 0:13:59 lr 0.000031 time 0.6891 (0.7293) loss 1.0123 (1.3321) grad_norm 2.7074 (2.8018) mem 39691MB [2022-04-01 01:45:04 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][200/1251] eta 0:12:30 lr 0.000030 time 0.6897 (0.7144) loss 1.0944 (1.3172) grad_norm 2.3161 (nan) mem 39691MB [2022-04-01 01:46:14 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][300/1251] eta 0:11:14 lr 0.000029 time 0.6853 (0.7096) loss 1.5585 (1.3295) grad_norm 2.4806 (nan) mem 39691MB [2022-04-01 01:47:24 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][400/1251] eta 0:10:01 lr 0.000028 time 0.6891 (0.7070) loss 1.5054 (1.3297) grad_norm 2.2357 (nan) mem 39691MB [2022-04-01 01:48:34 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][500/1251] eta 0:08:49 lr 0.000027 time 0.6900 (0.7056) loss 0.9077 (1.3258) grad_norm 2.1560 (nan) mem 39691MB [2022-04-01 01:49:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][600/1251] eta 0:07:38 lr 0.000026 time 0.6884 (0.7048) loss 0.8389 (1.3274) grad_norm 2.0718 (nan) mem 39691MB [2022-04-01 01:50:54 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][700/1251] eta 0:06:27 lr 0.000025 time 0.6897 (0.7041) loss 1.1630 (1.3244) grad_norm 2.7615 (nan) mem 39691MB [2022-04-01 01:52:04 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][800/1251] eta 0:05:17 lr 0.000024 time 0.6899 (0.7035) loss 1.3777 (1.3235) grad_norm 2.2742 (nan) mem 39691MB [2022-04-01 01:53:14 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][900/1251] eta 0:04:06 lr 0.000024 time 0.6901 (0.7030) loss 1.3908 (1.3225) grad_norm 2.2728 (nan) mem 39691MB [2022-04-01 01:54:24 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1000/1251] eta 0:02:56 lr 0.000023 time 0.6911 (0.7026) loss 1.1246 (1.3207) grad_norm 2.3794 (nan) mem 39691MB [2022-04-01 01:55:34 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1100/1251] eta 0:01:46 lr 0.000022 time 0.6891 (0.7024) loss 1.4379 (1.3210) grad_norm 2.5842 (nan) mem 39691MB [2022-04-01 01:56:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1200/1251] eta 0:00:35 lr 0.000021 time 0.6986 (0.7021) loss 1.4860 (1.3207) grad_norm 2.2715 (nan) mem 39691MB [2022-04-01 01:57:19 simmim_finetune] (main_finetune.py 230): INFO EPOCH 95 training takes 0:14:38 [2022-04-01 01:57:19 simmim_finetune] (utils.py 60): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_95.pth saving...... [2022-04-01 01:57:20 simmim_finetune] (utils.py 62): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_95.pth saved !!! [2022-04-01 01:57:23 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 2.916 (2.916) Loss 0.3640 (0.3640) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB [2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.602 Acc@5 96.632 [2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6% [2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66% [2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.068996329878458e-06, 1.068996329878458e-06, 1.1082728599612733e-06, 1.1082728599612733e-06, 1.168698290857912e-06, 1.168698290857912e-06, 1.2616604922373566e-06, 1.2616604922373566e-06, 1.4046792635903478e-06, 1.4046792635903478e-06, 1.62470814259495e-06, 1.62470814259495e-06, 1.963214110294338e-06, 1.963214110294338e-06, 2.4839925221395496e-06, 2.4839925221395496e-06, 3.285190078824491e-06, 3.285190078824491e-06, 4.517801704493633e-06, 4.517801704493633e-06, 6.414127282446157e-06, 6.414127282446157e-06, 9.331551248526965e-06, 9.331551248526965e-06, 1.3819895811728205e-05, 1.3819895811728205e-05, 2.0725041293576267e-05, 2.0725041293576267e-05] [2022-04-01 01:57:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][0/1251] eta 1:12:07 lr 0.000021 time 3.4596 (3.4596) loss 1.4750 (1.4750) grad_norm 2.6895 (2.6895) mem 39691MB [2022-04-01 01:58:49 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][100/1251] eta 0:13:55 lr 0.000020 time 0.6900 (0.7259) loss 1.5745 (1.2920) grad_norm 2.6246 (2.7933) mem 39691MB [2022-04-01 01:59:58 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][200/1251] eta 0:12:28 lr 0.000019 time 0.6896 (0.7125) loss 0.8455 (1.3166) grad_norm 2.4287 (2.7917) mem 39691MB [2022-04-01 02:01:09 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][300/1251] eta 0:11:13 lr 0.000018 time 0.6900 (0.7086) loss 0.9218 (1.3165) grad_norm 2.2249 (2.7938) mem 39691MB [2022-04-01 02:02:19 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][400/1251] eta 0:10:01 lr 0.000018 time 0.6907 (0.7068) loss 0.9292 (1.3220) grad_norm 2.1898 (2.7949) mem 39691MB [2022-04-01 02:03:29 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][500/1251] eta 0:08:49 lr 0.000017 time 0.6890 (0.7053) loss 1.1393 (1.3168) grad_norm 2.5243 (2.7906) mem 39691MB [2022-04-01 02:04:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][600/1251] eta 0:07:38 lr 0.000016 time 0.6888 (0.7045) loss 1.3772 (1.3255) grad_norm 2.5142 (2.7838) mem 39691MB [2022-04-01 02:05:49 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][700/1251] eta 0:06:27 lr 0.000016 time 0.6895 (0.7038) loss 1.5106 (1.3251) grad_norm 2.3905 (2.7886) mem 39691MB [2022-04-01 02:06:59 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][800/1251] eta 0:05:17 lr 0.000015 time 0.6896 (0.7033) loss 1.4453 (1.3282) grad_norm 2.3186 (2.7831) mem 39691MB [2022-04-01 02:08:09 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][900/1251] eta 0:04:06 lr 0.000014 time 0.6890 (0.7029) loss 1.6301 (1.3289) grad_norm 2.6045 (2.7818) mem 39691MB [2022-04-01 02:09:19 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1000/1251] eta 0:02:56 lr 0.000014 time 0.6890 (0.7027) loss 1.3683 (1.3259) grad_norm 2.2704 (2.7809) mem 39691MB [2022-04-01 02:10:29 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1100/1251] eta 0:01:46 lr 0.000013 time 0.6898 (0.7024) loss 1.1971 (1.3261) grad_norm 2.1725 (inf) mem 39691MB [2022-04-01 02:11:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1200/1251] eta 0:00:35 lr 0.000012 time 0.6898 (0.7022) loss 1.4186 (1.3273) grad_norm 2.1364 (inf) mem 39691MB [2022-04-01 02:12:14 simmim_finetune] (main_finetune.py 230): INFO EPOCH 96 training takes 0:14:38 [2022-04-01 02:12:17 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.006 (3.006) Loss 0.3643 (0.3643) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB [2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.592 Acc@5 96.658 [2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6% [2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66% [2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0388431447101286e-06, 1.0388431447101286e-06, 1.060954812742414e-06, 1.060954812742414e-06, 1.0949727635613148e-06, 1.0949727635613148e-06, 1.1473080725134695e-06, 1.1473080725134695e-06, 1.2278239324398616e-06, 1.2278239324398616e-06, 1.3516944861727725e-06, 1.3516944861727725e-06, 1.542264568838789e-06, 1.542264568838789e-06, 1.8354493114018914e-06, 1.8354493114018914e-06, 2.286502761498972e-06, 2.286502761498972e-06, 2.9804311462637114e-06, 2.9804311462637114e-06, 4.048013276671003e-06, 4.048013276671003e-06, 5.69044732345145e-06, 5.69044732345145e-06, 8.21726893388291e-06, 8.21726893388291e-06, 1.2104686796085155e-05, 1.2104686796085155e-05] [2022-04-01 02:12:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][0/1251] eta 1:10:45 lr 0.000012 time 3.3938 (3.3938) loss 1.1607 (1.1607) grad_norm 2.6309 (2.6309) mem 39691MB [2022-04-01 02:13:43 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][100/1251] eta 0:13:56 lr 0.000012 time 0.6900 (0.7268) loss 1.4923 (1.3099) grad_norm 2.3810 (2.7979) mem 39691MB [2022-04-01 02:14:53 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][200/1251] eta 0:12:30 lr 0.000011 time 0.6901 (0.7136) loss 1.3939 (1.3374) grad_norm 2.4213 (2.8035) mem 39691MB [2022-04-01 02:16:03 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][300/1251] eta 0:11:14 lr 0.000010 time 0.6900 (0.7089) loss 1.4464 (1.3354) grad_norm 2.3146 (2.7829) mem 39691MB [2022-04-01 02:17:13 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][400/1251] eta 0:10:01 lr 0.000010 time 0.7050 (0.7067) loss 1.4488 (1.3360) grad_norm 2.3009 (2.7833) mem 39691MB [2022-04-01 02:18:23 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][500/1251] eta 0:08:49 lr 0.000009 time 0.6896 (0.7056) loss 1.5225 (1.3309) grad_norm 2.6082 (2.7907) mem 39691MB [2022-04-01 02:19:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][600/1251] eta 0:07:38 lr 0.000009 time 0.6896 (0.7047) loss 1.5717 (1.3272) grad_norm 2.2648 (2.7870) mem 39691MB [2022-04-01 02:20:43 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][700/1251] eta 0:06:28 lr 0.000008 time 0.6906 (0.7042) loss 1.2725 (1.3265) grad_norm 2.9951 (2.7761) mem 39691MB [2022-04-01 02:21:53 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][800/1251] eta 0:05:17 lr 0.000008 time 0.6897 (0.7038) loss 1.5141 (1.3220) grad_norm 2.0174 (2.7768) mem 39691MB [2022-04-01 02:23:03 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][900/1251] eta 0:04:06 lr 0.000007 time 0.6903 (0.7034) loss 1.5645 (1.3245) grad_norm 2.3992 (2.7796) mem 39691MB [2022-04-01 02:24:13 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1000/1251] eta 0:02:56 lr 0.000007 time 0.6900 (0.7030) loss 0.8204 (1.3204) grad_norm 2.7277 (2.7826) mem 39691MB [2022-04-01 02:25:23 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1100/1251] eta 0:01:46 lr 0.000007 time 0.6897 (0.7027) loss 1.3750 (1.3185) grad_norm 1.9739 (2.7843) mem 39691MB [2022-04-01 02:26:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1200/1251] eta 0:00:35 lr 0.000006 time 0.6888 (0.7025) loss 1.2978 (1.3201) grad_norm 2.1809 (2.7888) mem 39691MB [2022-04-01 02:27:08 simmim_finetune] (main_finetune.py 230): INFO EPOCH 97 training takes 0:14:38 [2022-04-01 02:27:11 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.004 (3.004) Loss 0.3647 (0.3647) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB [2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.596 Acc@5 96.674 [2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6% [2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66% [2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0172799260266887e-06, 1.0172799260266887e-06, 1.0271166164073458e-06, 1.0271166164073458e-06, 1.0422499862237412e-06, 1.0422499862237412e-06, 1.0655320936335803e-06, 1.0655320936335803e-06, 1.101350720417948e-06, 1.101350720417948e-06, 1.156456300086206e-06, 1.156456300086206e-06, 1.2412341149604493e-06, 1.2412341149604493e-06, 1.371661522459285e-06, 1.371661522459285e-06, 1.5723190724574938e-06, 1.5723190724574938e-06, 1.8810229955316616e-06, 1.8810229955316616e-06, 2.3559521079534575e-06, 2.3559521079534575e-06, 3.0866122809100673e-06, 3.0866122809100673e-06, 4.210704854689466e-06, 4.210704854689466e-06, 5.9400780451193105e-06, 5.9400780451193105e-06] [2022-04-01 02:27:27 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][0/1251] eta 1:07:01 lr 0.000006 time 3.2148 (3.2148) loss 1.6244 (1.6244) grad_norm 2.8371 (2.8371) mem 39691MB [2022-04-01 02:28:37 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][100/1251] eta 0:13:52 lr 0.000006 time 0.6894 (0.7234) loss 1.4783 (1.3294) grad_norm 2.2639 (2.7801) mem 39691MB [2022-04-01 02:29:47 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][200/1251] eta 0:12:27 lr 0.000005 time 0.6890 (0.7116) loss 1.5537 (1.3179) grad_norm 2.2326 (2.7782) mem 39691MB [2022-04-01 02:30:57 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][300/1251] eta 0:11:12 lr 0.000005 time 0.6901 (0.7073) loss 1.4142 (1.3260) grad_norm 3.1747 (2.7874) mem 39691MB [2022-04-01 02:32:07 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][400/1251] eta 0:10:00 lr 0.000004 time 0.6900 (0.7054) loss 1.1928 (1.3229) grad_norm 2.6400 (2.7897) mem 39691MB [2022-04-01 02:33:17 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][500/1251] eta 0:08:48 lr 0.000004 time 0.6899 (0.7043) loss 1.3475 (1.3193) grad_norm 2.2912 (2.7904) mem 39691MB [2022-04-01 02:34:27 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][600/1251] eta 0:07:38 lr 0.000004 time 0.6880 (0.7036) loss 1.4417 (1.3187) grad_norm 2.1597 (2.7875) mem 39691MB [2022-04-01 02:35:37 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][700/1251] eta 0:06:27 lr 0.000004 time 0.6896 (0.7030) loss 0.9872 (1.3167) grad_norm 2.2810 (2.7852) mem 39691MB [2022-04-01 02:36:46 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][800/1251] eta 0:05:16 lr 0.000003 time 0.6900 (0.7026) loss 1.2168 (1.3158) grad_norm 2.3407 (2.7873) mem 39691MB [2022-04-01 02:37:56 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][900/1251] eta 0:04:06 lr 0.000003 time 0.6897 (0.7023) loss 1.1657 (1.3130) grad_norm 2.2128 (2.7891) mem 39691MB [2022-04-01 02:39:06 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1000/1251] eta 0:02:56 lr 0.000003 time 0.6889 (0.7020) loss 1.4734 (1.3160) grad_norm 2.2695 (2.7867) mem 39691MB [2022-04-01 02:40:16 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1100/1251] eta 0:01:45 lr 0.000003 time 0.6888 (0.7018) loss 1.4531 (1.3167) grad_norm 2.3891 (2.7864) mem 39691MB [2022-04-01 02:41:26 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1200/1251] eta 0:00:35 lr 0.000002 time 0.6893 (0.7015) loss 0.9294 (1.3169) grad_norm 2.5364 (inf) mem 39691MB [2022-04-01 02:42:01 simmim_finetune] (main_finetune.py 230): INFO EPOCH 98 training takes 0:14:37 [2022-04-01 02:42:04 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.071 (3.071) Loss 0.3626 (0.3626) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB [2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.570 Acc@5 96.640 [2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6% [2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66% [2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0043279541216191e-06, 1.0043279541216191e-06, 1.0067916651705151e-06, 1.0067916651705151e-06, 1.010581989861124e-06, 1.010581989861124e-06, 1.0164132586159072e-06, 1.0164132586159072e-06, 1.0253844413155735e-06, 1.0253844413155735e-06, 1.0391862608535217e-06, 1.0391862608535217e-06, 1.060419829373442e-06, 1.060419829373442e-06, 1.0930868578656273e-06, 1.0930868578656273e-06, 1.1433438247766814e-06, 1.1433438247766814e-06, 1.2206622354090724e-06, 1.2206622354090724e-06, 1.3396136363819813e-06, 1.3396136363819813e-06, 1.5226157917249184e-06, 1.5226157917249184e-06, 1.8041575691755905e-06, 1.8041575691755905e-06, 2.237298765253548e-06, 2.237298765253548e-06] [2022-04-01 02:42:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][0/1251] eta 1:09:49 lr 0.000002 time 3.3489 (3.3489) loss 1.0033 (1.0033) grad_norm 2.1860 (2.1860) mem 39691MB [2022-04-01 02:43:30 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][100/1251] eta 0:13:54 lr 0.000002 time 0.6967 (0.7249) loss 0.8255 (1.2927) grad_norm 2.2087 (2.7138) mem 39691MB [2022-04-01 02:44:40 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][200/1251] eta 0:12:28 lr 0.000002 time 0.6892 (0.7121) loss 1.0470 (1.2953) grad_norm 1.9035 (2.7317) mem 39691MB [2022-04-01 02:45:50 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][300/1251] eta 0:11:13 lr 0.000002 time 0.6904 (0.7080) loss 0.9423 (1.2993) grad_norm 2.3368 (2.7545) mem 39691MB [2022-04-01 02:47:00 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][400/1251] eta 0:10:00 lr 0.000002 time 0.6895 (0.7058) loss 1.1605 (1.3136) grad_norm 2.5136 (2.7636) mem 39691MB [2022-04-01 02:48:10 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][500/1251] eta 0:08:49 lr 0.000001 time 0.6903 (0.7046) loss 0.9751 (1.3128) grad_norm 2.5944 (2.7818) mem 39691MB [2022-04-01 02:49:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][600/1251] eta 0:07:38 lr 0.000001 time 0.7461 (0.7037) loss 1.6204 (1.3107) grad_norm 2.2264 (2.7801) mem 39691MB [2022-04-01 02:50:30 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][700/1251] eta 0:06:27 lr 0.000001 time 0.6892 (0.7030) loss 1.2943 (1.3101) grad_norm 2.3324 (2.7893) mem 39691MB [2022-04-01 02:51:40 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][800/1251] eta 0:05:16 lr 0.000001 time 0.6891 (0.7026) loss 1.3340 (1.3111) grad_norm 2.0425 (2.7911) mem 39691MB [2022-04-01 02:52:50 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][900/1251] eta 0:04:06 lr 0.000001 time 0.7028 (0.7021) loss 1.1269 (1.3138) grad_norm 2.3656 (2.7906) mem 39691MB [2022-04-01 02:54:00 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1000/1251] eta 0:02:56 lr 0.000001 time 0.6908 (0.7020) loss 1.0681 (1.3161) grad_norm 2.0883 (2.7906) mem 39691MB [2022-04-01 02:55:10 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1100/1251] eta 0:01:45 lr 0.000001 time 0.6897 (0.7017) loss 0.9208 (1.3166) grad_norm 2.1637 (2.7937) mem 39691MB [2022-04-01 02:56:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1200/1251] eta 0:00:35 lr 0.000001 time 0.6889 (0.7016) loss 1.5211 (1.3184) grad_norm 2.0254 (2.7999) mem 39691MB [2022-04-01 02:56:55 simmim_finetune] (main_finetune.py 230): INFO EPOCH 99 training takes 0:14:37 [2022-04-01 02:56:55 simmim_finetune] (utils.py 60): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_99.pth saving...... [2022-04-01 02:56:56 simmim_finetune] (utils.py 62): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_99.pth saved !!! [2022-04-01 02:56:59 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 2.885 (2.885) Loss 0.3653 (0.3653) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB [2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.606 Acc@5 96.688 [2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6% [2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66% [2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 148): INFO Training time 1 day, 0:49:35 [2022-04-01 03:17:39 simmim_finetune] (main_finetune.py 344): INFO Full config saved to /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/config.json [2022-04-01 03:17:39 simmim_finetune] (main_finetune.py 347): INFO AMP_OPT_LEVEL: O1
I had the same problem when I finetuned the ViT-B/32 with 1 node or 4 nodes. But when I finetuned with 2 nodes, I got 83.784 top-1 acc. Try finetuning with 2 nodes :)