FPN_Tensorflow OutOfRangeError

OutOfRangeError

Open AllentDan opened this issue 4 years ago • 4 comments

大佬你好。我运行同样的代码在一个tfrecord文件时正常运行。在另一个tfrecord文件时报错，不知道这是什么原因？用的multi_gpu_train，两块1080ti卡。 Traceback (most recent call last): File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, tower_0/gradients/tower_0/build_loss/FastRCNN_loss/Sum_grad/mod/_829)]] [[Node: tower_1/gradients/AddN_68/_8431 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_11930_tower_1/gradients/AddN_68", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "multi_gpu_train.py", line 346, in train() File "multi_gpu_train.py", line 303, in train _, global_stepnp = sess.run([train_op, global_step]) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run run_metadata_ptr) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run feed_dict_tensor, options, run_metadata) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run run_metadata) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, tower_0/gradients/tower_0/build_loss/FastRCNN_loss/Sum_grad/mod/_829)]] [[Node: tower_1/gradients/AddN_68/_8431 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_11930_tower_1/gradients/AddN_68", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op 'get_batch/batch', defined at: File "multi_gpu_train.py", line 346, in train() File "multi_gpu_train.py", line 120, in train is_training=True) File "../data/io/read_tfrecord_multi_gpu.py", line 108, in next_batch dynamic_pad=True) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 988, in batch name=name) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 762, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 476, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3480, in queue_dequeue_many_v2 component_types=component_types, timeout_ms=timeout_ms, name=name) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func return func(*args, **kwargs) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op op_def=op_def) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in init self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, tower_0/gradients/tower_0/build_loss/FastRCNN_loss/Sum_grad/mod/_829)]] [[Node: tower_1/gradients/AddN_68/_8431 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_11930_tower_1/gradients/AddN_68", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]。

Dec 27 '19 11:12 AllentDan

https://github.com/yangxue0827/R2CNN_FPN_Tensorflow/issues/5 这个可能是解决办法，应该是转化tfrecord时候部分图片没有gtbox导致

Dec 28 '19 01:12 AllentDan

你好。看了网上大家也有遇到类似的问题。主要应该是转化后的tfrecord出现问题。我检查了很多遍，这是我自己的数据集，把相应该修改的都修改了，但是还是报这个错。实在不知什么原因，跪求帮助。 WARNING:tensorflow:From /home/blao/FPN_Tensorflow/tools/train.py:51: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_regularization_losses instead. WARNING:tensorflow:From /home/blao/FPN_Tensorflow/tools/train.py:90: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step 2020-03-03 10:58:47.079505: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-03-03 10:58:47.421232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:1c:00.0 totalMemory: 31.72GiB freeMemory: 31.41GiB 2020-03-03 10:58:47.422364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-03-03 10:58:47.926802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-03-03 10:58:47.926902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-03-03 10:58:47.926927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-03-03 10:58:47.927222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30473 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1c:00.0, compute capability: 7.0) /home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " ++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++-- /home/blao/FPN_Tensorflow tfrecord path is --> /home/blao/FPN_Tensorflow/data/tfrecord/rgz_train* we are in Pyramid::-======>>>> ['P2', 'P3', 'P4', 'P5', 'P6'] base_anchor_size are: [32, 64, 128, 256, 512]

model restore from : /home/blao/FPN_Tensorflow/output/trained_weights/FPN_Res101_20181201/rgz_10000model.ckpt restore model Traceback (most recent call last): File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/blao/FPN_Tensorflow/tools/train.py", line 186, in train() File "/home/blao/FPN_Tensorflow/tools/train.py", line 144, in train _, global_stepnp = sess.run([train_op, global_step]) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 877, in run run_metadata_ptr) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1100, in _run feed_dict_tensor, options, run_metadata) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run run_metadata) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Caused by op 'get_batch/batch', defined at: File "/home/blao/FPN_Tensorflow/tools/train.py", line 186, in train() File "/home/blao/FPN_Tensorflow/tools/train.py", line 34, in train is_training=True) File "../data/io/read_tfrecord.py", line 98, in next_batch dynamic_pad=True) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 988, in batch name=name) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/input.py", line 762, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/data_flow_ops.py", line 476, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3480, in queue_dequeue_many_v2 component_types=component_types, timeout_ms=timeout_ms, name=name) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func return func(*args, **kwargs) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op op_def=op_def) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1717, in init self._traceback = tf_stack.extract_stack()

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

这是我数据其中的一个xml文件： RGZ2017 FIRSTJ103720.1+354621.png The RGZ2017 Database RGZ RGZ2017 rgzweb FIRSTJ103720.1+354621 rgzid rgz-member 132 132 3 0

    <object>
            <name>1_1</name>
            <pose>Unspecified</pose>
            <truncated>0</truncated>
            <difficult>0</difficult>
            <bndbox>
                    <xmin>57</xmin>
                    <ymin>57</ymin>
                    <xmax>73</xmax>
                    <ymax>74</ymax>
            </bndbox>
    </object>

Mar 03 '20 03:03 lao19881213

我也遇到相同状况困扰好久，很多人都说是路径或是数据xml文件问题，但我用各种方法检测过了，查不出问题也没解决状况，请问有大佬解决吗？实在不知什么原因，跪求帮助。

Jan 27 '21 07:01 cathy0522

我也是

Mar 16 '22 09:03 justcancel

FPN_Tensorflow FPN_Tensorflow copied to clipboard

OutOfRangeError

FPN_Tensorflow
FPN_Tensorflow copied to clipboard