DNC icon indicating copy to clipboard operation
DNC copied to clipboard

InvalidArgumentError (see above for traceback): Nan in summary histogram for: DNC/FF_step/dense_1/bias_0

Open SeekPoint opened this issue 7 years ago • 1 comments

Summary generated. Step 74400 Test cost == 31.212728500 Time == 8.57s Summary generated. Step 74500 Test cost == 10.503337860 Time == 8.38s Summary generated. Step 74600 Test cost == 749542.875000000 Time == 8.54s Summary generated. Step 74700 Test cost == 866.475341797 Time == 8.72s Summary generated. Step 74800 Test cost == 6.063397408 Time == 8.58s Summary generated. Step 74900 Test cost == 19.894033432 Time == 8.50s Summary generated. Step 75000 Test cost == 3009.345458984 Time == 8.53s Model saved! Summary generated. Step 75100 Test cost == 401.853302002 Time == 8.68s Summary generated. Step 75200 Test cost == 201.970886230 Time == 8.54s Summary generated. Step 75300 Test cost == 4504.419433594 Time == 8.59s Summary generated. Step 75400 Test cost == 6352.577148438 Time == 8.56s Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1139, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1121, in _run_fn status, run_metadata) File "/usr/lib/python3.5/contextlib.py", line 66, in exit next(self.gen) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: DNC/FF_step/dense_1/bias_0 [[Node: DNC/FF_step/dense_1/bias_0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](DNC/FF_step/dense_1/bias_0/tag, DNC/FF_step/dense_1/bias/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 54, in dnc.run_session(task, Hp, project_path) # , restore_path=restore_path) File "/home/mldl/ub16_prj/DNC/src/controller.py", line 94, in run_session mask: m}) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: DNC/FF_step/dense_1/bias_0 [[Node: DNC/FF_step/dense_1/bias_0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](DNC/FF_step/dense_1/bias_0/tag, DNC/FF_step/dense_1/bias/read)]]

Caused by op 'DNC/FF_step/dense_1/bias_0', defined at: File "main.py", line 54, in dnc.run_session(task, Hp, project_path) # , restore_path=restore_path) File "/home/mldl/ub16_prj/DNC/src/controller.py", line 57, in run_session tf.summary.histogram(variable.name, variable) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/summary/summary.py", line 221, in histogram tag=scope.rstrip('/'), values=values, name=scope) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 131, in _histogram_summary name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Nan in summary histogram for: DNC/FF_step/dense_1/bias_0 [[Node: DNC/FF_step/dense_1/bias_0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](DNC/FF_step/dense_1/bias_0/tag, DNC/FF_step/dense_1/bias/read)]]

mldl@mldlUB1604:~/ub16_prj/DNC/src$ mldl@mldlUB1604:~/ub16_prj/DNC/src$ mldl@mldlUB1604:~/ub16_prj/DNC/src$ python3 Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf tf.version '1.2.1'

SeekPoint avatar Jul 25 '17 14:07 SeekPoint

Yeah, this is a well-known problem with DNCs, there are NaNs sometimes and there doesn't really seem to exist a way to fix it other than restarting the training process.

I do welcome a conversation about possible ways to solve the problem

bgavran avatar Jul 25 '17 20:07 bgavran