InsightFace_TF
InsightFace_TF copied to clipboard
How to solve the problem of "summary" errors during training?
epoch 0, total_step 90180, total loss is 15.06 , inference loss is 8.29, weight deacy loss is 6.77, training accuracy is 0.312500, time 124.800 samples/sec epoch 0, total_step 90200, total loss is 15.34 , inference loss is 8.57, weight deacy loss is 6.77, training accuracy is 0.343750, time 132.006 samples/sec epoch 0, total_step 90220, total loss is 14.04 , inference loss is 7.27, weight deacy loss is 6.77, training accuracy is 0.328125, time 123.523 samples/sec epoch 0, total_step 90240, total loss is 17.67 , inference loss is 10.90, weight deacy loss is 6.77, training accuracy is 0.281250, time 130.974 samples/sec epoch 0, total_step 90260, total loss is nan , inference loss is nan, weight deacy loss is nan, training accuracy is 0.000000, time 128.621 samples/sec epoch 0, total_step 90280, total loss is nan , inference loss is nan, weight deacy loss is nan, training accuracy is 0.000000, time 133.669 samples/sec Traceback (most recent call last): File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 [[{{node resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1}} = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1/tag, resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma/read/_1409)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train_nets.py", line 210, in
Caused by op 'resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1', defined at:
File "train_nets.py", line 161, in
InvalidArgumentError (see above for traceback): Nan in summary histogram for: resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 [[node resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 (defined at train_nets.py:161) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1/tag, resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma/read/_1409)]]