micronet icon indicating copy to clipboard operation
micronet copied to clipboard

nin_gc训练一段step后,loss变很大很大

Open EdwardVincentMa opened this issue 2 years ago • 2 comments

刚开始正常,精度训练到80%多,训练一晚上早上到195个epoch再看,loss非常大,完全不收敛。不可思议

EdwardVincentMa avatar Nov 25 '21 03:11 EdwardVincentMa

Test set: Average loss: 18154214.1728, Accuracy: 1000/10000 (10.00%) Best Accuracy: 77.37%

Train Epoch: 175 [0/50000 (0%)] Loss: 15677834.000000 LR: 0.0001 Train Epoch: 175 [3200/50000 (6%)] Loss: 4205774.000000 LR: 0.0001 Train Epoch: 175 [6400/50000 (13%)] Loss: 1340564.750000 LR: 0.0001 Train Epoch: 175 [9600/50000 (19%)] Loss: 573608.937500 LR: 0.0001 Train Epoch: 175 [12800/50000 (26%)] Loss: 13077519.000000 LR: 0.0001 Train Epoch: 175 [16000/50000 (32%)] Loss: 1872735.250000 LR: 0.0001 Train Epoch: 175 [19200/50000 (38%)] Loss: 845358.062500 LR: 0.0001 Train Epoch: 175 [22400/50000 (45%)] Loss: 20978710.000000 LR: 0.0001 Train Epoch: 175 [25600/50000 (51%)] Loss: 635413.625000 LR: 0.0001 Train Epoch: 175 [28800/50000 (58%)] Loss: 26684102.000000 LR: 0.0001 Train Epoch: 175 [32000/50000 (64%)] Loss: 18137484.000000 LR: 0.0001 Train Epoch: 175 [35200/50000 (70%)] Loss: 645895.500000 LR: 0.0001 Train Epoch: 175 [38400/50000 (77%)] Loss: 27134622.000000 LR: 0.0001 Train Epoch: 175 [41600/50000 (83%)] Loss: 3623150.500000 LR: 0.0001 Train Epoch: 175 [44800/50000 (90%)] Loss: 9524407.000000 LR: 0.0001 Train Epoch: 175 [48000/50000 (96%)] Loss: 785436.125000 LR: 0.0001

EdwardVincentMa avatar Nov 25 '21 03:11 EdwardVincentMa

bn融合了吗?融合了的话是会比较抖。学习率给小点。 先训个浮点,加载它,再做qat吧。

666DZY666 avatar Dec 20 '21 09:12 666DZY666