finetune_alexnet_with_tensorflow
finetune_alexnet_with_tensorflow copied to clipboard
Fine-Tuning Fails With Exception Between Epoch1 and Epoch2
I have been trying to use this code to fine-tune the network to classify images from the Cifar10 dataset. However, I get the following error:
Traceback (most recent call last): File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/shashankiyer/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: fc6/weights_0 [[{{node fc6/weights_0}} = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fc6/weights_0/tag, fc6/weights/read)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "finetune.py", line 202, in
Caused by op 'fc6/weights_0', defined at:
File "finetune.py", line 137, in
InvalidArgumentError (see above for traceback): Nan in summary histogram for: fc6/weights_0 [[node fc6/weights_0 (defined at finetune.py:137) = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](fc6/weights_0/tag, fc6/weights/read)]]
These are lines in the code that cause this:
//Add gradients to summary for gradient, var in grads_and_vars: tf.summary.histogram(var.name + '/gradient', gradient)
//Add the variables we train to the summary for var in var_list: tf.summary.histogram(var.name, var)
I am running Tensorflow 1.12.0 Any pointers will be greatly appreciated.
NaN values are almost always a hint that your learning rate ist to high. Try to decrease ist to e.g. 1e-3 or 1e-4