raccoon_dataset
raccoon_dataset copied to clipboard
Caught OutOfRangeError. Stopping Training.
Hi, datitran @datitran
I follow your step and want to train on local machine. My main setup is below
fine_tune_checkpoint: "F:\GitHub\ssd_mobilenet_v1_coco_11_06_2017\model.ckpt" from_detection_checkpoint: true data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } }
train_input_reader: { tf_record_input_reader { input_path: "F:\GitHub\raccoon_dataset-master\data\train.record" } label_map_path: "F:\GitHub\raccoon_dataset-master\training\object-detection.pbtxt" }
eval_config: { num_examples: 40 }
eval_input_reader: { tf_record_input_reader { input_path: "F:\GitHub\raccoon_dataset-master\data\test.record" } label_map_path: "F:\GitHub\raccoon_dataset-master\training\object-detection.pbtxt" shuffle: false num_readers: 1 }
My computer environment is win7,GXT1060, 8G memory. The issue is below, it always have a error "INFO:tensorflow:Caught OutOfRangeError. Stopping Training." Do you know why this happen? Thank you very much
F:\models-master>python object_detection/train.py --logtostderr --pipeline_config_path=F:\GitHub\raccoon_dataset-master\training\ssd_m obilenet_v1_pets.config --train_dir=F:\train_dir INFO:tensorflow:Summary name Learning Rate is illegal; using Learning_Rate instead. WARNING:tensorflow:From F:\GitHub\models-master\object_detection\meta_architectures\ssd_meta_arch.py:607: all_variables (from tensorflo w.python.ops.variables) is deprecated and will be removed after 2017-03-02. Instructions for updating: Please use tf.global_variables instead. INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead. 2017-09-02 09:50:37.901800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU co mputations. 2017-09-02 09:50:37.901800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU c omputations. 2017-09-02 09:50:37.901800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU c omputations. 2017-09-02 09:50:37.901800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2017-09-02 09:50:37.902800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2017-09-02 09:50:37.902800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU co mputations. 2017-09-02 09:50:37.902800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU c omputations. 2017-09-02 09:50:37.903800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.c c:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU co mputations. 2017-09-02 09:50:38.065800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_devic e.cc:940] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate (GHz) 1.7085 pciBusID 0000:01:00.0 Total memory: 6.00GiB Free memory: 5.55GiB 2017-09-02 09:50:38.067800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_devic e.cc:961] DMA: 0 2017-09-02 09:50:38.068800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_devic e.cc:971] 0: Y 2017-09-02 09:50:38.068800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_devic e.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0) 2017-09-02 09:50:44.297800: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\simple_placer .cc:675] Ignoring device specification /device:GPU:0 for node 'prefetch_queue_Dequeue' because the input edge from 'prefetch_queue' is a reference connection and already has a device field set to /device:CPU:0 INFO:tensorflow:Restoring parameters from F:\GitHub\ssd_mobilenet_v1_coco_11_06_2017\model.ckpt INFO:tensorflow:Starting Session.
[[Node: prefetch_queue_Dequeue = QueueDequeueV2[component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D
T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](prefetch_queue)]] 2017-09-02 09:48:05.589800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Out of range: FIFOQueue '_6_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: prefetch_queue_Dequeue = QueueDequeueV2component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]] 2017-09-02 09:48:05.588800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Out of range: FIFOQueue '_6_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: prefetch_queue_Dequeue = QueueDequeueV2component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]] 2017-09-02 09:48:05.590800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Out of range: FIFOQueue '_6_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: prefetch_queue_Dequeue = QueueDequeueV2component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]] 2017-09-02 09:48:05.587800: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\framework\op_kernel.cc:1158] Out of range: FIFOQueue '_6_prefetch_queue' is closed and has insufficient elements (requested 1, current size 0) [[Node: prefetch_queue_Dequeue = QueueDequeueV2component_types=[DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_BOOL, DT_FLOAT, D T_INT32, DT_INT32, DT_INT32, DT_INT32, DT_FLOAT, DT_STRING, DT_INT64, DT_INT64, DT_STRING, DT_INT64, DT_BOOL, DT_INT32, DT_INT32, DT_IN T32, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"]]
INFO:tensorflow:Caught OutOfRangeError. Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
Traceback (most recent call last):
File "object_detection/train.py", line 198, in
[[Node: parallel_read/ReaderReadV2_1 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](parallel_read/TFRecordReaderV2_1, parallel_read/filenames)]]