faster-rcnn.pytorch
faster-rcnn.pytorch copied to clipboard
Nan loss while using Adam
It always happens at the beginning of training,then returns to normal,then abnormal,and so on,SGD algorithm does not have this problem. backbone:Res-101 dataset:VOC2007
WARNING:root:NaN or Inf found in input tensor. WARNING:root:NaN or Inf found in input tensor. [session 2][epoch 1][iter 2200/5011] loss: nan, lr: 1.00e-04 fg/bg=(44/212), time cost: 46.774358 rpn_cls: 0.3461, rpn_box: 0.0280, rcnn_cls: 4792356.0000, rcnn_box 3345985.2500 [session 2][epoch 1][iter 2300/5011] loss: 26642731337.7278, lr: 1.00e-04 fg/bg=(64/192), time cost: 46.891399 rpn_cls: 0.2703, rpn_box: 0.0300, rcnn_cls: 1.7583, rcnn_box 0.6415 [session 2][epoch 1][iter 2400/5011] loss: 5.2273, lr: 1.00e-04 fg/bg=(64/192), time cost: 46.817043 rpn_cls: 0.4445, rpn_box: 0.0740, rcnn_cls: 1.6762, rcnn_box 0.5618 [session 2][epoch 1][iter 2500/5011] loss: 6926028074.9964, lr: 1.00e-04 fg/bg=(4/252), time cost: 46.602897 rpn_cls: 0.3145, rpn_box: 0.0402, rcnn_cls: 1.1404, rcnn_box 0.0001 [session 2][epoch 1][iter 2600/5011] loss: 2.1567, lr: 1.00e-04 fg/bg=(5/251), time cost: 47.172931 rpn_cls: 0.2332, rpn_box: 0.0079, rcnn_cls: 1.0538, rcnn_box 0.0214 [session 2][epoch 1][iter 2700/5011] loss: 2.0732, lr: 1.00e-04 fg/bg=(48/208), time cost: 46.693046 rpn_cls: 0.5486, rpn_box: 0.3073, rcnn_cls: 1.3116, rcnn_box 0.4148