tiny-yolo-tensorflow
tiny-yolo-tensorflow copied to clipboard
I can not test.
Hi. I tried to train by Make train command. But, I can not finished training. as below
Traceback (most recent call last):
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
return fn(*args)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
status, run_metadata)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: loss
[[Node: loss = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](loss/tag, TRAINER/add_8/_95)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./train.py", line 97, in <module>
summary, _ , lossp, lxy, lwh, lobj, lnoobj, lp = sess.run([merge, trainer, loss, loss_xy, loss_wh, loss_obj, loss_noobj, loss_p], feed_dict = {X: Xp, Y1: Y1p, Y2:Y2p})
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: loss
[[Node: loss = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](loss/tag, TRAINER/add_8/_95)]]
Caused by op 'loss', defined at:
File "./train.py", line 73, in <module>
tf.summary.histogram("loss", loss)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/summary/summary.py", line 193, in histogram
tag=tag, values=values, name=scope)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 189, in _histogram_summary
"HistogramSummary", tag=tag, values=values, name=name)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3160, in create_op
op_def=op_def)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1625, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Nan in summary histogram for: loss
[[Node: loss = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](loss/tag, TRAINER/add_8/_95)]]
BTW, I can get ckpt data,
$ ls train_graph/
checkpoint tiny-yolo-10000.ckpt.meta tiny-yolo-5000.ckpt.data-00000-of-00001 tiny-yolo-7500.ckpt.index tiny-yolo-final.ckpt.meta
events.out.tfevents.1538097615.ubuntu2 tiny-yolo-2500.ckpt.data-00000-of-00001 tiny-yolo-5000.ckpt.index tiny-yolo-7500.ckpt.meta
tiny-yolo-10000.ckpt.data-00000-of-00001 tiny-yolo-2500.ckpt.index tiny-yolo-5000.ckpt.meta tiny-yolo-final.ckpt.data-00000-of-00001
tiny-yolo-10000.ckpt.index tiny-yolo-2500.ckpt.meta tiny-yolo-7500.ckpt.data-00000-of-00001 tiny-yolo-final.ckpt.index
So, I try to test at this ckpt data by 'make test -i data/dog.jpg' command. (I modify test.py
12c12
< saver.restore(sess,"./train_graph/tiny-yolo-final.ckpt")
---
> saver.restore("./train_graph/tiny-yolo-final.ckpt")
33c33
< im_out = np.zeros((1, size, size, 3 ))
---
> im_out = np.zeros(1, size, size, 3)
But I cannot get detect image.
Traceback (most recent call last):
File "./test2.py", line 56, in <module>
print(detect(im))
File "./test2.py", line 29, in detect
return sess.run(prediction, feed_dict = {X:Xp})
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/home/sounansu/anaconda3/envs/tiny-yolo-tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1104, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 416, 416, 3) for Tensor 'YOLO/input:0', which has shape '(32, 416, 416, 3)'
@sounansu Such a long time no touching on tensorflow.
- The training problem can come from many many reasons. One of them may be the drop out layer that I included. My suggestion is reading the yolo paper to know what kind of issues they did get.
- input shape is (32, 416, 416, 3). batch = 32. As I know, underlying implementation of Tensorflow is very optimized in memory use. While using a batch > 1, the cuda code probably splits records, feeds each one at a time in gpu and combines later. So I believe that TF supports changing input batch size.
I noticed the same problem. A temporary solution is to set batch size = 1 in create_graph, but the code needs to be changed in order to support a dynamic batch size in the placeholder