Swin-Transformer
Swin-Transformer copied to clipboard
Failed to reproduce the performance reported in the Table6 (Swin V2)
Hi there,
It's so exciting to know the post-norm can stabilize the training process. We tried to reimpl the swint w/ post-norm, just like reported in Table6 81.6 top1-acc in the manuscript (Swin V2). After placing the layernorm after MHSA/FFN in residual branch in original swin-t. we found the grad_norm fluctuates after 10-epoch training (extremely large occasionally), then the training diverged. No matter there is an out_norm on the top of the backbone or not, the training went diverged finally. Are there any extra settings for post-norm? It would be very appreciated if more details could be released.
thx
Can you share the code of SWIN V2
你好呀,
令人知道之后的规范文件训练过程非常稳定。我们可以用 6 后规范重新实现 swint,就像在手 (Swin V2) 中的 81.6 top1-FF 之后的报告中那样。将 MHSA/FF 的层一个设置在训练原始swin-t的剩余分支中。我们发现grad_norm在10个纪元后出现(每次会非常大),然后训练是否无异常。不管主干出现顶部存在out_norm,训练最终都发散了。post-norm有什么额外的设置吗?如果可以发布更多细节,将不胜感激。
谢谢 Can you share the code of SWIN V2
Can you share the code of SWIN V2
你好呀, 令人知道之后的规范文件训练过程非常稳定。我们可以用 6 后规范重新实现 swint,就像在手 (Swin V2) 中的 81.6 top1-FF 之后的报告中那样。将 MHSA/FF 的层一个设置在训练原始swin-t的剩余分支中。我们发现grad_norm在10个纪元后出现(每次会非常大),然后训练是否无异常。不管主干出现顶部存在out_norm,训练最终都发散了。post-norm有什么额外的设置吗?如果可以发布更多细节,将不胜感激。 谢谢 Can you share the code of SWIN V2
Actually, I tried it on Swin V1 (tiny). If you would like to reproduce my exps, please find the code here https://github.com/microsoft/Swin-Transformer/blob/main/models/swin_transformer.py then vary the position of LN from pre-res-branch-norm to post-res-branch-norm. I did not tune any setup for training or evaluation. Follow the original recipe and train it on ImageNet.
Here is the log for 14 epoch, gradient fluctuates ``[2022-03-14 09:02:33 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 146): INFO Accuracy of the network on the 50000 test images: 16.4% [2022-03-14 09:02:33 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 148): INFO Max accuracy: 32.64% [2022-03-14 09:02:37 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][0/1251] eta 1:26:04 lr 0.000700 time 4.1279 (4.1279) loss 6.6041 (6.6041) grad_norm 11.6209 (11.6209) mem 14764MB [2022-03-14 09:02:43 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][10/1251] eta 0:19:53 lr 0.000701 time 0.6263 (0.9618) loss 6.4966 (6.3835) grad_norm 9.8667 (12.8351) mem 14764MB [2022-03-14 09:02:50 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][20/1251] eta 0:16:34 lr 0.000701 time 0.6462 (0.8076) loss 6.1081 (6.2525) grad_norm 10.9848 (12.5064) mem 14764MB [2022-03-14 09:02:56 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][30/1251] eta 0:15:18 lr 0.000701 time 0.6348 (0.7522) loss 5.8018 (6.2524) grad_norm 11.1529 (13.4512) mem 14764MB [2022-03-14 09:03:02 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][40/1251] eta 0:14:36 lr 0.000702 time 0.6275 (0.7238) loss 6.1094 (6.2375) grad_norm 17.4747 (13.5872) mem 14764MB [2022-03-14 09:03:09 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][50/1251] eta 0:14:08 lr 0.000702 time 0.6264 (0.7069) loss 6.4405 (6.2434) grad_norm 16.7142 (14.3942) mem 14764MB [2022-03-14 09:03:15 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][60/1251] eta 0:13:47 lr 0.000703 time 0.6324 (0.6949) loss 6.2260 (6.2675) grad_norm 17.1434 (14.6072) mem 14764MB [2022-03-14 09:03:21 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][70/1251] eta 0:13:30 lr 0.000703 time 0.6376 (0.6862) loss 6.1336 (6.2803) grad_norm 21.0641 (15.1279) mem 14764MB [2022-03-14 09:03:28 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][80/1251] eta 0:13:16 lr 0.000703 time 0.6297 (0.6800) loss 6.0949 (6.3030) grad_norm 10.6999 (16.1512) mem 14764MB [2022-03-14 09:03:34 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][90/1251] eta 0:13:04 lr 0.000704 time 0.6304 (0.6759) loss 6.5553 (6.3260) grad_norm 9.7439 (15.8608) mem 14764MB [2022-03-14 09:03:41 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][100/1251] eta 0:12:53 lr 0.000704 time 0.6404 (0.6723) loss 6.3239 (6.3367) grad_norm 12.0286 (15.6899) mem 14764MB [2022-03-14 09:03:47 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][110/1251] eta 0:12:43 lr 0.000705 time 0.6415 (0.6691) loss 6.5125 (6.3395) grad_norm 7.0615 (15.3232) mem 14764MB [2022-03-14 09:03:53 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][120/1251] eta 0:12:33 lr 0.000705 time 0.6391 (0.6664) loss 6.2824 (6.3306) grad_norm 8.3860 (14.7469) mem 14764MB [2022-03-14 09:04:00 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][130/1251] eta 0:12:24 lr 0.000705 time 0.6290 (0.6643) loss 5.8217 (6.3346) grad_norm 5.9462 (14.2118) mem 14764MB [2022-03-14 09:04:06 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][140/1251] eta 0:12:16 lr 0.000706 time 0.6533 (0.6627) loss 6.5063 (6.3326) grad_norm 7.9695 (13.8363) mem 14764MB [2022-03-14 09:04:13 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][150/1251] eta 0:12:07 lr 0.000706 time 0.6308 (0.6610) loss 6.1553 (6.3252) grad_norm 7.6219 (13.3851) mem 14764MB [2022-03-14 09:04:19 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][160/1251] eta 0:11:59 lr 0.000707 time 0.6553 (0.6597) loss 6.5334 (6.3329) grad_norm 7.7822 (12.9743) mem 14764MB [2022-03-14 09:04:25 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][170/1251] eta 0:11:51 lr 0.000707 time 0.6392 (0.6583) loss 5.9105 (6.3226) grad_norm 4.9925 (12.6055) mem 14764MB [2022-03-14 09:04:32 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][180/1251] eta 0:11:43 lr 0.000707 time 0.6356 (0.6572) loss 5.9303 (6.3053) grad_norm 5.8747 (12.2422) mem 14764MB [2022-03-14 09:04:38 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][190/1251] eta 0:11:36 lr 0.000708 time 0.6381 (0.6561) loss 6.3529 (6.3090) grad_norm 5.3271 (11.9122) mem 14764MB [2022-03-14 09:04:44 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][200/1251] eta 0:11:28 lr 0.000708 time 0.6363 (0.6553) loss 6.4181 (6.3086) grad_norm 6.0856 (11.6893) mem 14764MB [2022-03-14 09:04:51 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][210/1251] eta 0:11:21 lr 0.000709 time 0.6279 (0.6545) loss 6.2717 (6.3060) grad_norm 12.6544 (11.5105) mem 14764MB [2022-03-14 09:04:57 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][220/1251] eta 0:11:13 lr 0.000709 time 0.6266 (0.6536) loss 6.1681 (6.3049) grad_norm 5.6731 (11.2966) mem 14764MB [2022-03-14 09:05:04 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][230/1251] eta 0:11:06 lr 0.000709 time 0.6269 (0.6529) loss 6.5316 (6.3071) grad_norm 9.9720 (11.0989) mem 14764MB [2022-03-14 09:05:10 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][240/1251] eta 0:10:59 lr 0.000710 time 0.6321 (0.6521) loss 6.5547 (6.3115) grad_norm 4.1293 (10.8179) mem 14764MB [2022-03-14 09:05:16 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][250/1251] eta 0:10:52 lr 0.000710 time 0.6408 (0.6516) loss 6.5584 (6.3074) grad_norm 6.5613 (10.6225) mem 14764MB [2022-03-14 09:05:23 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][260/1251] eta 0:10:45 lr 0.000711 time 0.6290 (0.6510) loss 5.8341 (6.3028) grad_norm 7.8036 (10.4777) mem 14764MB [2022-03-14 09:05:29 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][270/1251] eta 0:10:38 lr 0.000711 time 0.6464 (0.6506) loss 5.8300 (6.2990) grad_norm 5.5182 (10.3799) mem 14764MB [2022-03-14 09:05:35 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][280/1251] eta 0:10:31 lr 0.000711 time 0.6436 (0.6501) loss 6.2353 (6.2980) grad_norm 6.5363 (10.2560) mem 14764MB [2022-03-14 09:05:42 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][290/1251] eta 0:10:24 lr 0.000712 time 0.6342 (0.6497) loss 6.4183 (6.2946) grad_norm 9.3385 (10.2041) mem 14764MB [2022-03-14 09:05:48 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][300/1251] eta 0:10:17 lr 0.000712 time 0.6398 (0.6494) loss 6.0243 (6.2928) grad_norm 8.5172 (10.1990) mem 14764MB [2022-03-14 09:05:55 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][310/1251] eta 0:10:10 lr 0.000713 time 0.6562 (0.6491) loss 6.1640 (6.2866) grad_norm 8.2794 (10.3017) mem 14764MB [2022-03-14 09:06:01 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][320/1251] eta 0:10:03 lr 0.000713 time 0.6399 (0.6487) loss 6.4301 (6.2851) grad_norm 9.1655 (10.4611) mem 14764MB [2022-03-14 09:06:07 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][330/1251] eta 0:09:57 lr 0.000713 time 0.6424 (0.6484) loss 6.4165 (6.2858) grad_norm 7.2062 (10.3942) mem 14764MB [2022-03-14 09:06:14 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][340/1251] eta 0:09:50 lr 0.000714 time 0.6399 (0.6481) loss 6.3000 (6.2823) grad_norm 7.0257 (10.2886) mem 14764MB [2022-03-14 09:06:20 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][350/1251] eta 0:09:43 lr 0.000714 time 0.6437 (0.6478) loss 5.9323 (6.2821) grad_norm 7.8830 (10.1768) mem 14764MB [2022-03-14 09:06:26 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][360/1251] eta 0:09:36 lr 0.000715 time 0.6542 (0.6475) loss 6.6580 (6.2837) grad_norm 9.0249 (10.1534) mem 14764MB [2022-03-14 09:06:33 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][370/1251] eta 0:09:30 lr 0.000715 time 0.6377 (0.6473) loss 6.5887 (6.2808) grad_norm 13.0732 (10.1640) mem 14764MB [2022-03-14 09:06:39 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][380/1251] eta 0:09:23 lr 0.000715 time 0.6345 (0.6471) loss 6.1648 (6.2820) grad_norm 8.5424 (10.1272) mem 14764MB [2022-03-14 09:06:46 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][390/1251] eta 0:09:16 lr 0.000716 time 0.6379 (0.6469) loss 6.4885 (6.2832) grad_norm 9.0556 (10.0746) mem 14764MB [2022-03-14 09:06:52 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][400/1251] eta 0:09:10 lr 0.000716 time 0.6348 (0.6468) loss 6.2715 (6.2850) grad_norm 6.8595 (9.9948) mem 14764MB [2022-03-14 09:06:58 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][410/1251] eta 0:09:03 lr 0.000717 time 0.6409 (0.6466) loss 6.3243 (6.2854) grad_norm 9.6413 (9.9104) mem 14764MB [2022-03-14 09:07:05 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][420/1251] eta 0:08:57 lr 0.000717 time 0.6450 (0.6465) loss 6.0618 (6.2832) grad_norm 7.4447 (9.8642) mem 14764MB [2022-03-14 09:07:11 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][430/1251] eta 0:08:50 lr 0.000717 time 0.6400 (0.6463) loss 6.1800 (6.2840) grad_norm 9.9286 (9.8952) mem 14764MB [2022-03-14 09:07:18 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][440/1251] eta 0:08:43 lr 0.000718 time 0.6336 (0.6460) loss 6.4188 (6.2813) grad_norm 9.2126 (9.9035) mem 14764MB [2022-03-14 09:07:24 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][450/1251] eta 0:08:37 lr 0.000718 time 0.6285 (0.6459) loss 6.1997 (6.2806) grad_norm 6.7710 (9.8711) mem 14764MB [2022-03-14 09:07:30 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][460/1251] eta 0:08:30 lr 0.000719 time 0.6275 (0.6457) loss 6.4992 (6.2822) grad_norm 5.8936 (9.8202) mem 14764MB [2022-03-14 09:07:37 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][470/1251] eta 0:08:24 lr 0.000719 time 0.6364 (0.6454) loss 6.5686 (6.2795) grad_norm 6.5677 (9.7638) mem 14764MB [2022-03-14 09:07:43 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][480/1251] eta 0:08:17 lr 0.000719 time 0.6307 (0.6452) loss 6.1417 (6.2786) grad_norm 9.2035 (9.7925) mem 14764MB [2022-03-14 09:07:49 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][490/1251] eta 0:08:10 lr 0.000720 time 0.6304 (0.6450) loss 6.4989 (6.2814) grad_norm 9.3491 (10.1653) mem 14764MB [2022-03-14 09:07:56 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][500/1251] eta 0:08:04 lr 0.000720 time 0.6439 (0.6449) loss 6.4690 (6.2826) grad_norm 8.9872 (10.1395) mem 14764MB [2022-03-14 09:08:02 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][510/1251] eta 0:07:57 lr 0.000721 time 0.6409 (0.6447) loss 6.0100 (6.2849) grad_norm 7.2058 (10.1445) mem 14764MB [2022-03-14 09:08:09 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][520/1251] eta 0:07:51 lr 0.000721 time 0.6338 (0.6446) loss 6.3051 (6.2842) grad_norm 7.0466 (10.0985) mem 14764MB [2022-03-14 09:08:15 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][530/1251] eta 0:07:44 lr 0.000721 time 0.6257 (0.6444) loss 6.4635 (6.2834) grad_norm 7.8616 (10.0778) mem 14764MB [2022-03-14 09:08:21 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][540/1251] eta 0:07:38 lr 0.000722 time 0.6374 (0.6443) loss 5.7057 (6.2815) grad_norm 5.4850 (10.0385) mem 14764MB [2022-03-14 09:08:28 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][550/1251] eta 0:07:31 lr 0.000722 time 0.6329 (0.6442) loss 5.8941 (6.2821) grad_norm 7.6228 (10.0000) mem 14764MB [2022-03-14 09:08:34 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][560/1251] eta 0:07:25 lr 0.000723 time 0.6522 (0.6441) loss 5.9393 (6.2839) grad_norm 6.3866 (9.9587) mem 14764MB [2022-03-14 09:08:40 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][570/1251] eta 0:07:18 lr 0.000723 time 0.6338 (0.6440) loss 6.1669 (6.2838) grad_norm 7.3301 (9.9408) mem 14764MB [2022-03-14 09:08:47 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][580/1251] eta 0:07:12 lr 0.000723 time 0.6497 (0.6439) loss 6.6578 (6.2856) grad_norm 8.4536 (9.9720) mem 14764MB [2022-03-14 09:08:53 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][590/1251] eta 0:07:05 lr 0.000724 time 0.6509 (0.6438) loss 6.0437 (6.2880) grad_norm 9.1089 (9.9531) mem 14764MB [2022-03-14 09:09:00 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][600/1251] eta 0:06:59 lr 0.000724 time 0.6393 (0.6437) loss 6.5388 (6.2883) grad_norm 16.3138 (9.9896) mem 14764MB [2022-03-14 09:09:06 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][610/1251] eta 0:06:52 lr 0.000725 time 0.6253 (0.6436) loss 6.3011 (6.2868) grad_norm 7.7303 (9.9662) mem 14764MB [2022-03-14 09:09:12 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][620/1251] eta 0:06:46 lr 0.000725 time 0.6287 (0.6435) loss 6.2271 (6.2852) grad_norm 8.4853 (10.0451) mem 14764MB [2022-03-14 09:09:19 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][630/1251] eta 0:06:39 lr 0.000725 time 0.6376 (0.6434) loss 6.5074 (6.2877) grad_norm 9.4230 (10.1039) mem 14764MB [2022-03-14 09:09:25 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][640/1251] eta 0:06:33 lr 0.000726 time 0.6221 (0.6433) loss 6.3328 (6.2871) grad_norm 11.5074 (10.1277) mem 14764MB [2022-03-14 09:09:31 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][650/1251] eta 0:06:26 lr 0.000726 time 0.6403 (0.6433) loss 5.6992 (6.2853) grad_norm 22.5294 (10.1513) mem 14764MB [2022-03-14 09:09:38 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][660/1251] eta 0:06:20 lr 0.000727 time 0.6318 (0.6432) loss 6.1254 (6.2879) grad_norm 5.3352 (10.1213) mem 14764MB [2022-03-14 09:09:44 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][670/1251] eta 0:06:13 lr 0.000727 time 0.6319 (0.6430) loss 5.9913 (6.2870) grad_norm 6.4008 (10.0833) mem 14764MB [2022-03-14 09:09:51 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][680/1251] eta 0:06:07 lr 0.000727 time 0.6398 (0.6430) loss 6.1736 (6.2874) grad_norm 5.8858 (10.0838) mem 14764MB [2022-03-14 09:09:57 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][690/1251] eta 0:06:00 lr 0.000728 time 0.6397 (0.6429) loss 6.2499 (6.2880) grad_norm 8.1473 (10.0636) mem 14764MB [2022-03-14 09:10:03 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][700/1251] eta 0:05:54 lr 0.000728 time 0.6413 (0.6428) loss 6.1512 (6.2885) grad_norm 5.3174 (10.0026) mem 14764MB [2022-03-14 09:10:10 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][710/1251] eta 0:05:47 lr 0.000729 time 0.6220 (0.6427) loss 5.8303 (6.2891) grad_norm 4.2068 (9.9372) mem 14764MB [2022-03-14 09:10:16 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][720/1251] eta 0:05:41 lr 0.000729 time 0.6437 (0.6426) loss 6.3989 (6.2907) grad_norm 7.2940 (9.8662) mem 14764MB [2022-03-14 09:10:22 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][730/1251] eta 0:05:34 lr 0.000729 time 0.6512 (0.6425) loss 6.5220 (6.2912) grad_norm 5.9162 (9.8150) mem 14764MB [2022-03-14 09:10:29 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][740/1251] eta 0:05:28 lr 0.000730 time 0.6430 (0.6425) loss 6.5124 (6.2919) grad_norm 6.0650 (9.7714) mem 14764MB [2022-03-14 09:10:35 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][750/1251] eta 0:05:21 lr 0.000730 time 0.6319 (0.6424) loss 6.2465 (6.2930) grad_norm 5.6122 (9.7488) mem 14764MB [2022-03-14 09:10:42 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][760/1251] eta 0:05:15 lr 0.000731 time 0.6321 (0.6423) loss 5.9782 (6.2933) grad_norm 7.9615 (9.7384) mem 14764MB [2022-03-14 09:10:48 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][770/1251] eta 0:05:08 lr 0.000731 time 0.6401 (0.6422) loss 6.2265 (6.2933) grad_norm 8.4785 (9.6920) mem 14764MB [2022-03-14 09:10:54 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][780/1251] eta 0:05:02 lr 0.000731 time 0.6255 (0.6421) loss 6.1945 (6.2919) grad_norm 6.2676 (9.6644) mem 14764MB [2022-03-14 09:11:01 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][790/1251] eta 0:04:55 lr 0.000732 time 0.6284 (0.6420) loss 6.4136 (6.2921) grad_norm 5.2703 (9.6368) mem 14764MB [2022-03-14 09:11:07 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][800/1251] eta 0:04:49 lr 0.000732 time 0.6373 (0.6420) loss 5.7119 (6.2910) grad_norm 7.6856 (9.5961) mem 14764MB [2022-03-14 09:11:13 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][810/1251] eta 0:04:43 lr 0.000733 time 0.6353 (0.6419) loss 6.4453 (6.2923) grad_norm 5.8408 (9.5487) mem 14764MB [2022-03-14 09:11:20 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][820/1251] eta 0:04:36 lr 0.000733 time 0.6436 (0.6419) loss 6.6094 (6.2922) grad_norm 6.3717 (9.5163) mem 14764MB [2022-03-14 09:11:26 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][830/1251] eta 0:04:30 lr 0.000733 time 0.6355 (0.6418) loss 6.4326 (6.2911) grad_norm 6.2078 (9.4632) mem 14764MB [2022-03-14 09:11:32 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][840/1251] eta 0:04:23 lr 0.000734 time 0.6205 (0.6417) loss 5.8829 (6.2893) grad_norm 7.6940 (9.4194) mem 14764MB [2022-03-14 09:11:39 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][850/1251] eta 0:04:17 lr 0.000734 time 0.6439 (0.6417) loss 6.3727 (6.2895) grad_norm 3.3050 (9.3658) mem 14764MB [2022-03-14 09:11:45 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][860/1251] eta 0:04:10 lr 0.000735 time 0.6377 (0.6416) loss 6.5920 (6.2890) grad_norm 6.6718 (9.3059) mem 14764MB [2022-03-14 09:11:51 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][870/1251] eta 0:04:04 lr 0.000735 time 0.6220 (0.6415) loss 6.6029 (6.2896) grad_norm 3.8135 (9.2420) mem 14764MB [2022-03-14 09:11:58 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][880/1251] eta 0:03:57 lr 0.000735 time 0.6557 (0.6415) loss 5.6844 (6.2887) grad_norm 5.7636 (9.1901) mem 14764MB [2022-03-14 09:12:04 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][890/1251] eta 0:03:51 lr 0.000736 time 0.6409 (0.6414) loss 6.2024 (6.2885) grad_norm 5.0368 (9.1396) mem 14764MB [2022-03-14 09:12:11 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][900/1251] eta 0:03:45 lr 0.000736 time 0.6336 (0.6413) loss 5.9559 (6.2886) grad_norm 4.8301 (9.0912) mem 14764MB [2022-03-14 09:12:17 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][910/1251] eta 0:03:38 lr 0.000737 time 0.6618 (0.6412) loss 5.7374 (6.2875) grad_norm 5.6281 (9.0449) mem 14764MB [2022-03-14 09:12:23 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][920/1251] eta 0:03:32 lr 0.000737 time 0.6330 (0.6412) loss 5.7960 (6.2858) grad_norm 5.7271 (9.0161) mem 14764MB [2022-03-14 09:12:30 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][930/1251] eta 0:03:25 lr 0.000737 time 0.6615 (0.6412) loss 6.5598 (6.2843) grad_norm 8.6092 (8.9983) mem 14764MB [2022-03-14 09:12:36 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][940/1251] eta 0:03:19 lr 0.000738 time 0.6429 (0.6411) loss 6.1084 (6.2832) grad_norm 6.7170 (8.9836) mem 14764MB [2022-03-14 09:12:42 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][950/1251] eta 0:03:12 lr 0.000738 time 0.6571 (0.6411) loss 6.0152 (6.2814) grad_norm 10.9909 (8.9750) mem 14764MB [2022-03-14 09:12:49 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][960/1251] eta 0:03:06 lr 0.000739 time 0.6379 (0.6411) loss 6.5567 (6.2819) grad_norm 7.9970 (8.9580) mem 14764MB [2022-03-14 09:12:55 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][970/1251] eta 0:03:00 lr 0.000739 time 0.6371 (0.6411) loss 5.7192 (6.2813) grad_norm 6.1566 (8.9717) mem 14764MB [2022-03-14 09:13:02 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][980/1251] eta 0:02:53 lr 0.000739 time 0.6679 (0.6411) loss 6.4395 (6.2809) grad_norm 5.0046 (8.9565) mem 14764MB [2022-03-14 09:13:08 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][990/1251] eta 0:02:47 lr 0.000740 time 0.6211 (0.6411) loss 5.9389 (6.2805) grad_norm 6.4271 (9.0098) mem 14764MB [2022-03-14 09:13:14 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1000/1251] eta 0:02:40 lr 0.000740 time 0.6398 (0.6411) loss 6.1609 (6.2800) grad_norm 15.9076 (9.0168) mem 14764MB [2022-03-14 09:13:21 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1010/1251] eta 0:02:34 lr 0.000741 time 0.6230 (0.6411) loss 5.8673 (6.2804) grad_norm 13.2889 (9.0164) mem 14764MB [2022-03-14 09:13:27 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1020/1251] eta 0:02:28 lr 0.000741 time 0.6445 (0.6410) loss 5.6570 (6.2799) grad_norm 9.9896 (9.0043) mem 14764MB [2022-03-14 09:13:34 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1030/1251] eta 0:02:21 lr 0.000741 time 0.6438 (0.6410) loss 6.4273 (6.2775) grad_norm 5.6865 (9.0535) mem 14764MB [2022-03-14 09:13:40 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1040/1251] eta 0:02:15 lr 0.000742 time 0.6374 (0.6410) loss 6.5207 (6.2782) grad_norm 7.3589 (9.0449) mem 14764MB [2022-03-14 09:13:46 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1050/1251] eta 0:02:08 lr 0.000742 time 0.6341 (0.6409) loss 5.6717 (6.2768) grad_norm 7.5549 (9.0369) mem 14764MB [2022-03-14 09:13:53 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1060/1251] eta 0:02:02 lr 0.000743 time 0.6367 (0.6410) loss 5.9471 (6.2771) grad_norm 12.2015 (9.0643) mem 14764MB [2022-03-14 09:13:59 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1070/1251] eta 0:01:56 lr 0.000743 time 0.6380 (0.6410) loss 6.3182 (6.2775) grad_norm 8.0450 (9.0591) mem 14764MB [2022-03-14 09:14:06 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1080/1251] eta 0:01:49 lr 0.000743 time 0.6361 (0.6409) loss 5.7649 (6.2771) grad_norm 8.7188 (9.0548) mem 14764MB [2022-03-14 09:14:12 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1090/1251] eta 0:01:43 lr 0.000744 time 0.6389 (0.6409) loss 6.1890 (6.2776) grad_norm 6.2961 (9.0484) mem 14764MB [2022-03-14 09:14:18 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1100/1251] eta 0:01:36 lr 0.000744 time 0.6464 (0.6409) loss 6.2762 (6.2768) grad_norm 8.0812 (9.0486) mem 14764MB [2022-03-14 09:14:25 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1110/1251] eta 0:01:30 lr 0.000745 time 0.6377 (0.6409) loss 5.9014 (6.2754) grad_norm 12.7924 (9.0608) mem 14764MB [2022-03-14 09:14:31 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1120/1251] eta 0:01:23 lr 0.000745 time 0.6315 (0.6409) loss 6.3879 (6.2740) grad_norm 7.9810 (9.0655) mem 14764MB [2022-03-14 09:14:37 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1130/1251] eta 0:01:17 lr 0.000745 time 0.6479 (0.6408) loss 6.3873 (6.2744) grad_norm 8.2998 (9.0816) mem 14764MB [2022-03-14 09:14:44 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1140/1251] eta 0:01:11 lr 0.000746 time 0.6363 (0.6408) loss 6.2817 (6.2748) grad_norm 14.0788 (9.1010) mem 14764MB [2022-03-14 09:14:50 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1150/1251] eta 0:01:04 lr 0.000746 time 0.6421 (0.6407) loss 6.4133 (6.2749) grad_norm 6.8397 (9.1062) mem 14764MB [2022-03-14 09:14:57 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1160/1251] eta 0:00:58 lr 0.000747 time 0.6321 (0.6407) loss 6.0611 (6.2746) grad_norm 6.6298 (9.1426) mem 14764MB [2022-03-14 09:15:03 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1170/1251] eta 0:00:51 lr 0.000747 time 0.6378 (0.6407) loss 5.8670 (6.2748) grad_norm 6.0898 (9.1290) mem 14764MB [2022-03-14 09:15:09 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1180/1251] eta 0:00:45 lr 0.000747 time 0.6724 (0.6407) loss 6.4942 (6.2759) grad_norm 11.2443 (9.1384) mem 14764MB [2022-03-14 09:15:16 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1190/1251] eta 0:00:39 lr 0.000748 time 0.6389 (0.6407) loss 6.1190 (6.2753) grad_norm 8.0706 (9.1366) mem 14764MB [2022-03-14 09:15:22 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1200/1251] eta 0:00:32 lr 0.000748 time 0.6286 (0.6406) loss 6.3116 (6.2749) grad_norm 12.6369 (9.1646) mem 14764MB [2022-03-14 09:15:29 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1210/1251] eta 0:00:26 lr 0.000749 time 0.6481 (0.6406) loss 6.4324 (6.2749) grad_norm 13.3404 (9.2353) mem 14764MB [2022-03-14 09:15:35 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1220/1251] eta 0:00:19 lr 0.000749 time 0.6235 (0.6406) loss 6.5972 (6.2746) grad_norm 21.8618 (9.3254) mem 14764MB [2022-03-14 09:15:41 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1230/1251] eta 0:00:13 lr 0.000749 time 0.6361 (0.6406) loss 6.3502 (6.2755) grad_norm 13.5770 (9.4596) mem 14764MB [2022-03-14 09:15:48 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1240/1251] eta 0:00:07 lr 0.000750 time 0.6311 (0.6406) loss 6.5427 (6.2760) grad_norm 15.5059 (9.5862) mem 14764MB [2022-03-14 09:15:54 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 226): INFO Train: [14/300][1250/1251] eta 0:00:00 lr 0.000750 time 0.6284 (0.6405) loss 6.6049 (6.2762) grad_norm 29.3853 (9.7251) mem 14764MB [2022-03-14 09:15:54 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 233): INFO EPOCH 14 training takes 0:13:21 [2022-03-14 09:15:58 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 273): INFO Test: [0/49] Time 3.867 (3.867) Loss 5.3453 (5.3453) Acc@1 8.984 (8.984) Acc@5 22.949 (22.949) Mem 14764MB [2022-03-14 09:16:02 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 273): INFO Test: [10/49] Time 0.192 (0.669) Loss 5.3070 (5.3251) Acc@1 9.863 (9.934) Acc@5 23.535 (22.967) Mem 14764MB [2022-03-14 09:16:05 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 273): INFO Test: [20/49] Time 0.194 (0.522) Loss 5.2598 (5.3249) Acc@1 10.352 (9.584) Acc@5 23.926 (23.056) Mem 14764MB [2022-03-14 09:16:09 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 273): INFO Test: [30/49] Time 0.194 (0.472) Loss 5.3817 (5.3184) Acc@1 9.473 (9.529) Acc@5 23.145 (23.195) Mem 14764MB [2022-03-14 09:16:13 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 273): INFO Test: [40/49] Time 1.046 (0.464) Loss 5.3244 (5.3208) Acc@1 8.496 (9.468) Acc@5 23.340 (23.097) Mem 14764MB [2022-03-14 09:16:15 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 279): INFO * Acc@1 9.516 Acc@5 23.130 [2022-03-14 09:16:15 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 146): INFO Accuracy of the network on the 50000 test images: 9.5% [2022-03-14 09:16:15 swin_v1_tiny_patch4_window7_224_WindowAttention] (main.py 148): INFO Max accuracy: 32.64%
能分享一下SWIN V2的代码吗
你好呀, 令人知道之后的规范文件训练过程非常稳定。我们可以用 6 后规范重新 swint,就像在手(Swin V2)中的 81.6 top1-FF 实现之后的报告中一样。将 MHSA/FF的一层设置在最初的win-t的剩余分支没有出现。我们发现gradnorm在10纪元后(每次训练一个非常大),是否没有异常。最终训练都发散了。post-norm有额外的的设置吗?如果可以发布更多细节,将不胜诉。 谢谢你能分享WIN2 的代码吗?
我在 Swin/swin_transformer/swin_transformer/blob/main/swin_transformer/swin-transformer/swin-transformer / swin_transformer .py然后将 LN 的位置从 pre-res-branch-specification 到 post-res-branch-norm。我没有为评估或调整任何设置。
https://github.com/microsoft/Swin-Transformer/issues/183#issuecomment-1086493042
能分享一下SWIN V2的代码吗
你好呀, 令人知道之后的规范文件训练过程非常稳定。我们可以用 6 后规范重新 swint,就像在手(Swin V2)中的 81.6 top1-FF 实现之后的报告中一样。将 MHSA/FF的训练设置在最初的win-t剩余的分支没有出现。我们发现gradnor在10纪元(每次一个非常大),是否没有异常。最终的设置都发散了。post-norm有额外的设置代码吗?如果可以发布更多细节,将不胜诉。 谢谢你能分享WIN2 的代码吗?
我在 Swin/swin_transformer/swin_transformer/blob/main/swin_transformer/swin-transformer/swin-transformer / swin_transformer .py然后将 LN 的位置从 pre-res-branch-specification 到 post-res-branch-norm。我没有为评估或调整设置。
Hello, I'm a novice in the first year of research. I want to experiment with the effect of swin v2. If I can give the source code, thank you very much for your help. This is my email [email protected] perhaps [email protected]
能分享一下SWIN V2的代码吗
你好呀, 令人知道之后的规范文件训练过程非常稳定。我们可以用 6 后规范重新 swint,就像在手(Swin V2)中的 81.6 top1-FF 实现之后的报告中一样。将 MHSA/FF的一层设置在最初的win-t的剩余分支没有出现。我们发现gradnorm在10纪元后(每次训练一个非常大),是否没有异常。最终训练都发散了。post-norm有额外的的设置吗?如果可以发布更多细节,将不胜诉。 谢谢你能分享WIN2 的代码吗?
我在 Swin/swin_transformer/swin_transformer/blob/main/swin_transformer/swin-transformer/swin-transformer / swin_transformer .py然后将 LN 的位置从 pre-res-branch-specification 到 post-res-branch-norm。我没有为评估或调整任何设置。
能分享一下SWIN V2的代码吗
你好呀, 令人知道之后的规范文件训练过程非常稳定。我们可以用 6 后规范重新 swint,就像在手(Swin V2)中的 81.6 top1-FF 实现之后的报告中一样。将 MHSA/FF的训练设置在最初的win-t剩余的分支没有出现。我们发现gradnor在10纪元(每次一个非常大),是否没有异常。最终的设置都发散了。post-norm有额外的设置代码吗?如果可以发布更多细节,将不胜诉。 谢谢你能分享WIN2 的代码吗?
我在 Swin/swin_transformer/swin_transformer/blob/main/swin_transformer/swin-transformer/swin-transformer / swin_transformer .py然后将 LN 的位置从 pre-res-branch-specification 到 post-res-branch-norm。我没有为评估或调整设置。
Hello, I'm a novice in the first year of research. I want to experiment with the effect of swin v2. If I can give the source code, thank you very much for your help. This is my email [email protected] perhaps [email protected]
https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/swin_transformer_v2_cr.py Here is an unofficial reimplementation, give it a try if needed.