DNC
DNC copied to clipboard
InvalidArgumentError (see above for traceback): Nan in summary histogram for: DNC/FF_step/dense_1/bias_0
Summary generated. Step 74400 Test cost == 31.212728500 Time == 8.57s Summary generated. Step 74500 Test cost == 10.503337860 Time == 8.38s Summary generated. Step 74600 Test cost == 749542.875000000 Time == 8.54s Summary generated. Step 74700 Test cost == 866.475341797 Time == 8.72s Summary generated. Step 74800 Test cost == 6.063397408 Time == 8.58s Summary generated. Step 74900 Test cost == 19.894033432 Time == 8.50s Summary generated. Step 75000 Test cost == 3009.345458984 Time == 8.53s Model saved! Summary generated. Step 75100 Test cost == 401.853302002 Time == 8.68s Summary generated. Step 75200 Test cost == 201.970886230 Time == 8.54s Summary generated. Step 75300 Test cost == 4504.419433594 Time == 8.59s Summary generated. Step 75400 Test cost == 6352.577148438 Time == 8.56s Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1139, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1121, in _run_fn status, run_metadata) File "/usr/lib/python3.5/contextlib.py", line 66, in exit next(self.gen) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: DNC/FF_step/dense_1/bias_0 [[Node: DNC/FF_step/dense_1/bias_0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](DNC/FF_step/dense_1/bias_0/tag, DNC/FF_step/dense_1/bias/read)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 54, in
Caused by op 'DNC/FF_step/dense_1/bias_0', defined at:
File "main.py", line 54, in
InvalidArgumentError (see above for traceback): Nan in summary histogram for: DNC/FF_step/dense_1/bias_0 [[Node: DNC/FF_step/dense_1/bias_0 = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](DNC/FF_step/dense_1/bias_0/tag, DNC/FF_step/dense_1/bias/read)]]
mldl@mldlUB1604:~/ub16_prj/DNC/src$ mldl@mldlUB1604:~/ub16_prj/DNC/src$ mldl@mldlUB1604:~/ub16_prj/DNC/src$ python3 Python 3.5.2 (default, Nov 17 2016, 17:05:23) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information.
import tensorflow as tf tf.version '1.2.1'
Yeah, this is a well-known problem with DNCs, there are NaNs sometimes and there doesn't really seem to exist a way to fix it other than restarting the training process.
I do welcome a conversation about possible ways to solve the problem