FPN_Tensorflow
FPN_Tensorflow copied to clipboard
OutOfRangeError
大佬你好。我运行同样的代码在一个tfrecord文件时正常运行。在另一个tfrecord文件时报错,不知道这是什么原因?用的multi_gpu_train,两块1080ti卡。 Traceback (most recent call last): File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/allent/anaconda3/envs/tensorflow_py35/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, tower_0/gradients/tower_0/build_loss/FastRCNN_loss/Sum_grad/mod/_829)]] [[Node: tower_1/gradients/AddN_68/_8431 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_11930_tower_1/gradients/AddN_68", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "multi_gpu_train.py", line 346, in
Caused by op 'get_batch/batch', defined at:
File "multi_gpu_train.py", line 346, in
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, tower_0/gradients/tower_0/build_loss/FastRCNN_loss/Sum_grad/mod/_829)]] [[Node: tower_1/gradients/AddN_68/_8431 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device_incarnation=1, tensor_name="edge_11930_tower_1/gradients/AddN_68", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]。
https://github.com/yangxue0827/R2CNN_FPN_Tensorflow/issues/5 这个可能是解决办法,应该是转化tfrecord时候部分图片没有gtbox导致
你好。看了网上大家也有遇到类似的问题。主要应该是转化后的tfrecord出现问题。我检查了很多遍,这是我自己的数据集,把相应该修改的都修改了,但是还是报这个错。实在不知什么原因,跪求帮助。 WARNING:tensorflow:From /home/blao/FPN_Tensorflow/tools/train.py:51: get_regularization_losses (from tensorflow.contrib.losses.python.losses.loss_ops) is deprecated and will be removed after 2016-12-30. Instructions for updating: Use tf.losses.get_regularization_losses instead. WARNING:tensorflow:From /home/blao/FPN_Tensorflow/tools/train.py:90: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step 2020-03-03 10:58:47.079505: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2020-03-03 10:58:47.421232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53 pciBusID: 0000:1c:00.0 totalMemory: 31.72GiB freeMemory: 31.41GiB 2020-03-03 10:58:47.422364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0 2020-03-03 10:58:47.926802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-03-03 10:58:47.926902: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 2020-03-03 10:58:47.926927: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N 2020-03-03 10:58:47.927222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30473 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:1c:00.0, compute capability: 7.0) /home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " ++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++--++-- /home/blao/FPN_Tensorflow tfrecord path is --> /home/blao/FPN_Tensorflow/data/tfrecord/rgz_train* we are in Pyramid::-======>>>> ['P2', 'P3', 'P4', 'P5', 'P6'] base_anchor_size are: [32, 64, 128, 256, 512]
model restore from : /home/blao/FPN_Tensorflow/output/trained_weights/FPN_Res101_20181201/rgz_10000model.ckpt restore model Traceback (most recent call last): File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call return fn(*args) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/blao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/blao/FPN_Tensorflow/tools/train.py", line 186, in
Caused by op 'get_batch/batch', defined at:
File "/home/blao/FPN_Tensorflow/tools/train.py", line 186, in
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
这是我数据其中的一个xml文件:
<object>
<name>1_1</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>57</xmin>
<ymin>57</ymin>
<xmax>73</xmax>
<ymax>74</ymax>
</bndbox>
</object>
我也遇到相同状况困扰好久,很多人都说是路径或是数据xml文件问题,但我用各种方法检测过了,查不出问题也没解决状况,请问有大佬解决吗?实在不知什么原因,跪求帮助。
我也是