PSPNet-tensorflow
PSPNet-tensorflow copied to clipboard
Training crashes on Cityscapes
I'm training PSPNet using the train.py script provided - i've tried running it on GTX1080 and TitanX . It always crashes after about 500 steps. Log below:
step 590 loss = 0.266, (0.723 sec/step)
Traceback (most recent call last):
File "train.py", line 219, in <module>
main()
File "train.py", line 210, in main
loss_value, _ = sess.run([reduced_loss, train_op], feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
Caused by op u'create_inputs/batch', defined at:
File "train.py", line 219, in <module>
main()
File "train.py", line 121, in main
image_batch, label_batch = reader.dequeue(args.batch_size)
File "/home/ml/codes/PSPNet-tensorflow/image_reader.py", line 116, in dequeue
num_elements)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 927, in batch
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 722, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
component_types=component_types, timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]
Maybe you need to check your feed data at 500 steps, I guess the size is not same as before so that the crashes occur.
I was strained with the same error too but however after a close at the error and debugging, i found out the solution. Check the cityscapes_train_list.txt file in the list folder and make sure that you do not have any extra/empty lines. It is basically trying to take the empty line as an input but not able to find the required image since it is not mentioned. Hence the error "has insufficient elements (requested 1, current size 0)". It is a simple logical error and a code limitation. @RaceSu @kshitijagrwl