tensorflow-deeplab-resnet icon indicating copy to clipboard operation
tensorflow-deeplab-resnet copied to clipboard

result about run inference.py

Open sunbin1205 opened this issue 6 years ago • 8 comments

Perform inference. py tests a picture, and the segmentation result shows no complete outline of the object, but a lot of color spots. How to solve it? Looking forward to your reply!! @DrSleep

sunbin1205 avatar Jan 23 '18 02:01 sunbin1205

According to the link provided by readme, only deeplab_resnet_init. CKPT file, not the pre-trained model file, is this the reason for the inaccuracy of inferernce?? Could you please provide the pre- training document? Thank you very much! @DrSleep

sunbin1205 avatar Jan 23 '18 03:01 sunbin1205

there is deeplab_resnet.ckpt provided; you need to download it and run inference with it

DrSleep avatar Jan 24 '18 01:01 DrSleep

I am very glad to receive your reply. The last problem has been solved! But the other problem is that I have downloaded the segmentationclassaug dataset in the dataset, and I have this error when I run the train.py. step 62 loss = 1.486, (62.292 sec/step) step 63 loss = 1.632, (61.552 sec/step) 2018-01-25 02:10:06.684534: W tensorflow/core/framework/op_kernel.cc:1152] Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg step 64 loss = 1.602, (62.308 sec/step) step 65 loss = 1.555, (61.499 sec/step) step 66 loss = 1.723, (62.328 sec/step) Traceback (most recent call last): File "train.py", line 258, in main() File "train.py", line 251, in main loss_value, _ = sess.run([reduced_loss, train_op], feed_dict=feed_dict) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at: File "train.py", line 258, in main() File "train.py", line 146, in main image_batch, label_batch = reader.dequeue(args.batch_size) File "/media/Linux/sun/Segmentation/tensorflow-deeplab-resnet-master/deeplab_resnet/image_reader.py", line 179, in dequeue num_elements) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]] This is the result of incorrect data set. I tried to delete the relevant path in the train.txt, but it seems that there are too many problems, and I think deleting the path is not a correct method. What is the reason for this?

sunbin1205 avatar Jan 24 '18 03:01 sunbin1205

Make sure that all the images from the training list are present.

Otherwise, you will be getting this error:

Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg

On 24 January 2018 at 13:40, sunbin1205 [email protected] wrote:

I am very glad to receive your reply. The last problem has been solved! But the other problem is that I have downloaded the segmentationclassaug dataset in the dataset, and I have this error when I run the train.py. step 62 loss = 1.486, (62.292 sec/step) step 63 loss = 1.632, (61.552 sec/step) 2018-01-25 02:10:06.684534: W tensorflow/core/framework/op_kernel.cc:1152] Not found: ./dataset/VOCdevkit/JPEGImages/2007_000032.jpg step 64 loss = 1.602, (62.308 sec/step) step 65 loss = 1.555, (61.499 sec/step) step 66 loss = 1.723, (62.328 sec/step) Traceback (most recent call last): File "train.py", line 258, in main() File "train.py", line 251, in main loss_value, _ = sess.run([reduced_loss, train_op], feed_dict=feed_dict) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/ replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at: File "train.py", line 258, in main() File "train.py", line 146, in main image_batch, label_batch = reader.dequeue(args.batch_size) File "/media/Linux/sun/Segmentation/tensorflow- deeplab-resnet-master/deeplab_resnet/image_reader.py", line 179, in dequeue num_elements) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/training/input.py", line 917, in batch name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/training/input.py", line 712, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/ tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 10, current size 5) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/ replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]] This is the result of incorrect data set. I tried to delete the relevant path in the train.txt, but it seems that there are too many problems, and I think deleting the path is not a correct method. What is the reason for this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/DrSleep/tensorflow-deeplab-resnet/issues/163#issuecomment-360007021, or mute the thread https://github.com/notifications/unsubscribe-auth/AHemmOv8_rTRp6LEAKMjqKzh1W95rcjlks5tNp8UgaJpZM4Ro-2a .

DrSleep avatar Jan 24 '18 03:01 DrSleep

@DrSleep I'm so sorry to distrub you again.Run the (python fine_tun.py -not-restore-last) error.it seems that jpg and png can be load ,but what is the problem?I guess may be the reason is cuda??

2018-01-26 03:50:20.103520: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 2018-01-26 03:50:20.127612: # E tensorflow/stream_executor/cuda/cuda_driver.cc:405] failed call to cuInit: # CUDA_ERROR_NO_DEVICE 2018-01-26 03:50:20.127687: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: zbw-System-Product-Name 2018-01-26 03:50:20.127700: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: zbw-System-Product-Name 2018-01-26 03:50:20.127740: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 387.26.0 2018-01-26 03:50:20.127773: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.90 Tue Sep 19 19:17:35 PDT 2017 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5) """ 2018-01-26 03:50:20.127796: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.90.0 2018-01-26 03:50:20.127807: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 384.90.0 does not match DSO version 387.26.0 -- cannot find working devices in this configuration Restored model parameters from ./deeplab_resnet.ckpt Traceback (most recent call last): File "fine_tune.py", line 207, in main() File "fine_tune.py", line 196, in main loss_value, images, labels, preds, summary, _ = sess.run([reduced_loss, image_batch, label_batch, pred, total_summary, optim]) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

Caused by op u'create_inputs/batch', defined at: File "fine_tune.py", line 207, in main() File "fine_tune.py", line 125, in main image_batch, label_batch = reader.dequeue(args.batch_size) File "/media/Linux/sun/Segmentation/building_segmentation/deeplab_resnet/image_reader.py", line 179, in dequeue num_elements) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 917, in batch name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 712, in _batch dequeued = queue.dequeue_many(batch_size, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1328, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/home/zbw/anaconda2/envs/sun/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): FIFOQueue '_1_create_inputs/batch/fifo_queue' is closed and has insufficient elements (requested 2, current size 0) [[Node: create_inputs/batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](create_inputs/batch/fifo_queue, create_inputs/batch/n)]]

sunbin1205 avatar Jan 25 '18 04:01 sunbin1205

Hi @DrSleep,

I also meet the problem when I run the inference.py. I retain the model using the following setup, because the original batch size is out of memory I changed it to 4 BATCH_SIZE = 4 DATA_DIRECTORY = '/PASCAL/SemanticImg' DATA_LIST_PATH = './dataset/train.txt' IGNORE_LABEL = 255 INPUT_SIZE = '321,321' LEARNING_RATE = 2.5e-4 MOMENTUM = 0.9 NUM_CLASSES = 21 NUM_STEPS = 20001 POWER = 0.9 RANDOM_SEED = 1234 RESTORE_FROM = './deeplab_resnet.ckpt' SAVE_NUM_IMAGES = 2 SAVE_PRED_EVERY = 1000 SNAPSHOT_DIR = './snapshots/' WEIGHT_DECAY = 0.0005

the train.txt is as follows:

/JPEGImages/2007_000032.jpg /SegmentationClass/2007_000032.png /JPEGImages/2007_000039.jpg /SegmentationClass/2007_000039.png /JPEGImages/2007_000063.jpg /SegmentationClass/2007_000063.png /JPEGImages/2007_000068.jpg /SegmentationClass/2007_000068.png /JPEGImages/2007_000121.jpg /SegmentationClass/2007_000121.png /JPEGImages/2007_000170.jpg /SegmentationClass/2007_000170.png /JPEGImages/2007_000241.jpg /SegmentationClass/2007_000241.png /JPEGImages/2007_000243.jpg /SegmentationClass/2007_000243.png /JPEGImages/2007_000250.jpg /SegmentationClass/2007_000250.png /JPEGImages/2007_000256.jpg /SegmentationClass/2007_000256.png /JPEGImages/2007_000333.jpg /SegmentationClass/2007_000333.png /JPEGImages/2007_000363.jpg /SegmentationClass/2007_000363.png /JPEGImages/2007_000364.jpg /SegmentationClass/2007_000364.png /JPEGImages/2007_000392.jpg /SegmentationClass/2007_000392.png /JPEGImages/2007_000480.jpg /SegmentationClass/2007_000480.png /JPEGImages/2007_000504.jpg /SegmentationClass/2007_000504.png /JPEGImages/2007_000515.jpg /SegmentationClass/2007_000515.png /JPEGImages/2007_000528.jpg /SegmentationClass/2007_000528.png /JPEGImages/2007_000549.jpg /SegmentationClass/2007_000549.png /JPEGImages/2007_000584.jpg /SegmentationClass/2007_000584.png /JPEGImages/2007_000645.jpg /SegmentationClass/2007_000645.png /JPEGImages/2007_000648.jpg /SegmentationClass/2007_000648.png /JPEGImages/2007_000713.jpg /SegmentationClass/2007_000713.png /JPEGImages/2007_000720.jpg /SegmentationClass/2007_000720.png /JPEGImages/2007_000733.jpg /SegmentationClass/2007_000733.png /JPEGImages/2007_000738.jpg /SegmentationClass/2007_000738.png /JPEGImages/2007_000768.jpg /SegmentationClass/2007_000768.png . . . The total number of training images is 1464 , the name of the images is from the VOC2012 trian list.

The logs is showing as follow: Restored model parameters from ./deeplab_resnet.ckpt The checkpoint has been created. step 0 loss = 1.268, (16.858 sec/step) step 1 loss = 3.971, (1.734 sec/step) step 2 loss = 1.308, (1.339 sec/step) step 3 loss = 2.991, (1.329 sec/step) step 4 loss = 1.252, (1.346 sec/step) step 5 loss = 1.344, (1.335 sec/step) step 6 loss = 8.126, (1.331 sec/step) step 7 loss = 4.652, (1.339 sec/step) step 8 loss = 5.097, (1.339 sec/step) step 9 loss = 1.318, (1.334 sec/step) step 10 loss = 1.769, (1.353 sec/step) . . . step 19990 loss = 1.191, (1.419 sec/step) step 19991 loss = 1.183, (1.425 sec/step) step 19992 loss = 1.197, (1.424 sec/step) step 19993 loss = 1.183, (1.422 sec/step) step 19994 loss = 1.184, (1.408 sec/step) step 19995 loss = 1.192, (1.416 sec/step) step 19996 loss = 1.183, (1.419 sec/step) step 19997 loss = 1.183, (1.414 sec/step) step 19998 loss = 1.183, (1.420 sec/step) step 19999 loss = 1.183, (1.437 sec/step) The checkpoint has been created. step 20000 loss = 1.183, (12.276 sec/step)

It looks normal right?

but when I run the inferency. py the results are all background,even though I use the training image. Also if I use the deeplab_resnet.ckpt for inference, the output has both background and the segmented image.

Do you have any suggestion about this error?

Thank you!

Jingyao12 avatar Feb 09 '18 14:02 Jingyao12

@sunbin1205

E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 384.90.0 does not match DSO version 387.26.0 -- cannot find working devices in this configuration

Something with the drivers I assume. Can't help with that one.

@minnieyao You are restoring from the already pre-trained model, hence the learning rate might be too high. Try to restore from the init model and check the progress in tensorboard, should be alright

DrSleep avatar Apr 08 '18 04:04 DrSleep

Hi, @minnieyao. I got the same problem. Did you solve it? I restored from the deeplab_resnet_init.ckpt

RaphaelDuan avatar Aug 26 '19 09:08 RaphaelDuan