raccoon_dataset icon indicating copy to clipboard operation
raccoon_dataset copied to clipboard

Caught OutOfRangeError. Stopping Training.

Open l-chenyao opened this issue 8 years ago • 0 comments

Hi, datitran @datitran

I follow your step and want to train on local machine. My main setup is below

fine_tune_checkpoint: "F:\GitHub\ssd_mobilenet_v1_coco_11_06_2017\model.ckpt" from_detection_checkpoint: true data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } }

train_input_reader: { tf_record_input_reader { input_path: "F:\GitHub\raccoon_dataset-master\data\train.record" } label_map_path: "F:\GitHub\raccoon_dataset-master\training\object-detection.pbtxt" }

eval_config: { num_examples: 40 }

eval_input_reader: { tf_record_input_reader { input_path: "F:\GitHub\raccoon_dataset-master\data\test.record" } label_map_path: "F:\GitHub\raccoon_dataset-master\training\object-detection.pbtxt" shuffle: false num_readers: 1 }

My computer environment is win7,GXT1060, 8G memory. The issue is below, it always have a error "INFO:tensorflow:Caught OutOfRangeError. Stopping Training." Do you know why this happen? Thank you very much

F:\models-master>python object_detection/train.py --logtostderr --pipeline_config_path=F:\GitHub\raccoon_dataset-master\training\ssd_m obilenet_v1_pets.config --train_dir=F:\train_dir INFO:tensorflow:Summary name Learning Rate is illegal; using Learning_Rate instead. WARNING:tensorflow:From F:\GitHub\models-master\object_detection\meta_architectures\ssd_meta_arch.py:607: all_variables (from tensorflo w.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead. 2017-09-02 09:50:37.901800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU co mputations. 2017-09-02 09:50:37.901800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU c omputations. 2017-09-02 09:50:37.901800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU c omputations. 2017-09-02 09:50:37.901800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-09-02 09:50:37.902800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-09-02 09:50:37.902800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU co mputations. 2017-09-02 09:50:37.902800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU c omputations. 2017-09-02 09:50:37.903800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU co mputations. 2017-09-02 09:50:38.065800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_devic e.cc:940] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate (GHz) 1.7085 pciBusID 0000:01:00.0 Total memory: 6.00GiB Free memory: 5.55GiB 2017-09-02 09:50:38.067800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_devic e.cc:961] DMA: 0 2017-09-02 09:50:38.068800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_devic e.cc:971] 0: Y 2017-09-02 09:50:38.068800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_devic e.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0) 2017-09-02 09:50:44.297800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\simple_placer .cc:675] Ignoring device specification /device:GPU:0 for node 'prefetch_queue_Dequeue' because the input edge from 'prefetch_queue' is a reference connection and already has a device field set to /device:CPU:0 INFO:tensorflow:Restoring parameters from F:\GitHub\ssd_mobilenet_v1_coco_11_06_2017\model.ckpt INFO:tensorflow:Starting Session.

     [[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D

T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]] 2017-09-02 09:48:05.589800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Out of range: FIFOQueue '_6_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: prefetch_queue_Dequeue = QueueDequeueV2component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]] 2017-09-02 09:48:05.588800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Out of range: FIFOQueue '_6_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: prefetch_queue_Dequeue = QueueDequeueV2component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]] 2017-09-02 09:48:05.590800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Out of range: FIFOQueue '_6_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: prefetch_queue_Dequeue = QueueDequeueV2component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]] 2017-09-02 09:48:05.587800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Out of range: FIFOQueue '_6_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: prefetch_queue_Dequeue = QueueDequeueV2component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]]

INFO:tensorflow:Caught OutOfRangeError. Stopping Training. INFO:tensorflow:Finished training! Saving model to disk. Traceback (most recent call last): File "object_detection/train.py", line 198, in tf.app.run() File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "object_detection/train.py", line 194, in main worker_job_name, is_chief, FLAGS.train_dir) File "F:\GitHub\models-master\object_detection\trainer.py", line 296, in train saver=saver) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 759, in train sv.saver.save(sess, sv.save_path, global_step=sv.global_step) File "C:\Program Files\Anaconda3\lib\contextlib.py", line 66, in exit next(self.gen) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\supervisor.py", line 964, in managed_session self.stop(close_summary_writer=close_summary_writer) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\supervisor.py", line 792, in stop stop_grace_period_secs=self._stop_grace_secs) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "C:\Program Files\Anaconda3\lib\site-packages\six.py", line 686, in reraise raise value File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 238, in _run enqueue_callable() File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1063, in _single_operation_run target_list_as_strings, status, None) File "C:\Program Files\Anaconda3\lib\contextlib.py", line 66, in exit next(self.gen) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) accoon_dataset-master\data rain.record : \udcce\u013c\udcfe\udcc3\udcfb\udca1\udca2\u013f\xbc\udcc3\udcfb\udcbb\udcf2\udcbe\udced\udcb1\udcea\udcd3\ufde8\udcb2\udcbb\udcd5\udcfd\u0237\udca1\ud ca3

     [[Node: parallel_read/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/TFRecordReaderV2_1, parallel_read/filenames)]]

l-chenyao avatar Aug 31 '17 15:08 l-chenyao