Switchable-Normalization icon indicating copy to clipboard operation
Switchable-Normalization copied to clipboard

when I use SN instead of BN, there is a big difference between val acc and train acc

Open GYxiaOH opened this issue 6 years ago • 1 comments

when i use your code resnetv2sn50 ,it is normal difference between val and train(maybe ,i only run 10 epochs). but when i use my CNN model), there is a big difference between val acc and train acc,like this:

Epoch [1/120]: 100%|#| 625/625 [1:07:57<00:00, 6.43s/it, loss=6.5377, lr=0.1000, top1_avg=1.02, top1_val=3.47, top5_avg=3.75, top5_val=10.40] Validation: 100%|######| 24/24 [02:20<00:00, 5.84s/it, loss=7.6148, top1_avg=0.54, top5_avg=2.26] Epoch [2/120]: 100%|#| 625/625 [1:07:23<00:00, 6.40s/it, loss=5.3173, lr=0.2050, top1_avg=6.57, top1_val=10.21, top5_avg=18.45, top5_val=26.07] Validation: 100%|######| 24/24 [02:18<00:00, 6.10s/it, loss=7.7053, top1_avg=0.50, top5_avg=1.98] Epoch [3/120]: 100%|#| 625/625 [1:07:15<00:00, 6.51s/it, loss=4.6169, lr=0.3100, top1_avg=13.02, top1_val=15.82, top5_avg=30.95, top5_val=35.99] Validation: 100%|######| 24/24 [02:19<00:00, 5.85s/it, loss=9.0973, top1_avg=0.21, top5_avg=1.11] Epoch [4/120]: 100%|#| 625/625 [1:07:30<00:00, 6.48s/it, loss=4.1465, lr=0.4150, top1_avg=18.47, top1_val=21.44, top5_avg=39.85, top5_val=42.77] Validation: 100%|######| 24/24 [02:18<00:00, 5.92s/it, loss=7.7036, top1_avg=0.90, top5_avg=3.23] Epoch [5/120]: 100%|#| 625/625 [1:07:26<00:00, 6.52s/it, loss=3.8405, lr=0.5200, top1_avg=22.54, top1_val=24.46, top5_avg=45.77, top5_val=49.37] Validation: 100%|######| 24/24 [02:18<00:00, 5.75s/it, loss=7.5990, top1_avg=1.46, top5_avg=5.14] Epoch [6/120]: 100%|#| 625/625 [1:07:15<00:00, 6.41s/it, loss=3.6326, lr=0.6250, top1_avg=25.56, top1_val=27.59, top5_avg=49.71, top5_val=51.22] Validation: 100%|######| 24/24 [02:20<00:00, 5.88s/it, loss=8.4800, top1_avg=1.21, top5_avg=5.50] Epoch [7/120]: 100%|#| 625/625 [1:07:25<00:00, 6.25s/it, loss=3.4629, lr=0.6249, top1_avg=28.14, top1_val=27.83, top5_avg=52.97, top5_val=52.20] Validation: 100%|#####| 24/24 [02:18<00:00, 5.77s/it, loss=7.4793, top1_avg=3.26, top5_avg=10.19] Epoch [8/120]: 100%|#| 625/625 [1:07:22<00:00, 6.48s/it, loss=3.3294, lr=0.6245, top1_avg=30.29, top1_val=30.03, top5_avg=55.50, top5_val=55.62] Validation: 100%|#####| 24/24 [02:20<00:00, 5.91s/it, loss=5.3514, top1_avg=8.52, top5_avg=22.21] Epoch [9/120]: 100%|#| 625/625 [1:07:23<00:00, 6.56s/it, loss=3.2302, lr=0.6239, top1_avg=31.93, top1_val=33.84, top5_avg=57.33, top5_val=58.84] Validation: 100%|#####| 24/24 [02:21<00:00, 5.81s/it, loss=5.1456, top1_avg=9.44, top5_avg=24.98] Epoch [10/120]: 100%|#| 625/625 [1:07:24<00:00, 6.44s/it, loss=3.1546, lr=0.6231, top1_avg=33.16, top1_val=33.15, top5_avg=58.67, top5_val=58.30] Validation: 100%|####| 24/24 [02:19<00:00, 5.91s/it, loss=4.6928, top1_avg=13.57, top5_avg=31.87] Epoch [11/120]: 100%|#| 625/625 [1:07:24<00:00, 6.12s/it, loss=3.1030, lr=0.6220, top1_avg=34.02, top1_val=36.08, top5_avg=59.62, top5_val=60.50] Validation: 100%|####| 24/24 [02:20<00:00, 5.87s/it, loss=5.0949, top1_avg=10.75, top5_avg=26.40] Epoch [12/120]: 100%|#| 625/625 [1:07:25<00:00, 6.28s/it, loss=3.0508, lr=0.6207, top1_avg=34.98, top1_val=35.55, top5_avg=60.59, top5_val=60.11] Validation: 100%|####| 24/24 [02:22<00:00, 6.03s/it, loss=5.0900, top1_avg=10.55, top5_avg=25.98]

What should I pay attention to? I found you change order between bn(sn) and conv ,is it important?

GYxiaOH avatar Jul 26 '18 02:07 GYxiaOH

@GYxiaOH Try batch average when evaluating BN in SN. Batch average is stable than moving average for BN. In some tasks there could be difference, please see figure 8 in the paper.

pluo911 avatar Jul 26 '18 07:07 pluo911