InsightFace_TF icon indicating copy to clipboard operation
InsightFace_TF copied to clipboard

How to solve the problem of "summary" errors during training?

Open sunruina2 opened this issue 5 years ago • 0 comments

epoch 0, total_step 90180, total loss is 15.06 , inference loss is 8.29, weight deacy loss is 6.77, training accuracy is 0.312500, time 124.800 samples/sec epoch 0, total_step 90200, total loss is 15.34 , inference loss is 8.57, weight deacy loss is 6.77, training accuracy is 0.343750, time 132.006 samples/sec epoch 0, total_step 90220, total loss is 14.04 , inference loss is 7.27, weight deacy loss is 6.77, training accuracy is 0.328125, time 123.523 samples/sec epoch 0, total_step 90240, total loss is 17.67 , inference loss is 10.90, weight deacy loss is 6.77, training accuracy is 0.281250, time 130.974 samples/sec epoch 0, total_step 90260, total loss is nan , inference loss is nan, weight deacy loss is nan, training accuracy is 0.000000, time 128.621 samples/sec epoch 0, total_step 90280, total loss is nan , inference loss is nan, weight deacy loss is nan, training accuracy is 0.000000, time 133.669 samples/sec Traceback (most recent call last): File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 [[{{node resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1}} = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1/tag, resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma/read/_1409)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train_nets.py", line 210, in summary_op_val = sess.run(summary_op, feed_dict=feed_dict) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 [[node resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 (defined at train_nets.py:161) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1/tag, resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma/read/_1409)]]

Caused by op 'resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1', defined at: File "train_nets.py", line 161, in summaries.append(tf.summary.histogram(var.op.name, var)) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/summary/summary.py", line 187, in histogram tag=tag, values=values, name=scope) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 284, in histogram_summary "HistogramSummary", tag=tag, values=values, name=name) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/data/sunruina/anaconda2/envs/py36ten12/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Nan in summary histogram for: resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 [[node resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1 (defined at train_nets.py:161) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma_1/tag, resnet_v1_50/block3/unit_13/bottleneck_v1/conv2_bn/BatchNorm/gamma/read/_1409)]]

sunruina2 avatar Oct 17 '19 04:10 sunruina2