triplet-reid
triplet-reid copied to clipboard
training error: OutOfRangeError: End of sequence
I use the codes to train my own dataset, but raised this error at sees.run(). The detail printed log is as below in which I changed some args such as net_input_height size and batch_p. my tensorflow version is 1.7. I don't know what's wrong here
Instructions for updating: Use the retry module or similar alternatives. 2018-09-27 11:12:06,474 [INFO] train: Training using the following parameters: 2018-09-27 11:12:06,474 [INFO] train: batch_k: 4 2018-09-27 11:12:06,474 [INFO] train: batch_p: 8 2018-09-27 11:12:06,474 [INFO] train: checkpoint_frequency: 1000 2018-09-27 11:12:06,474 [INFO] train: crop_augment: False 2018-09-27 11:12:06,474 [INFO] train: decay_start_iteration: 100000 2018-09-27 11:12:06,474 [INFO] train: detailed_logs: False 2018-09-27 11:12:06,474 [INFO] train: embedding_dim: 128 2018-09-27 11:12:06,475 [INFO] train: experiment_root: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/ 2018-09-27 11:12:06,475 [INFO] train: flip_augment: False 2018-09-27 11:12:06,475 [INFO] train: head_name: fc1024 2018-09-27 11:12:06,475 [INFO] train: image_root: F:/projector/GestureClassification/data/img/20180919/triplet_data/img/ 2018-09-27 11:12:06,475 [INFO] train: initial_checkpoint: None 2018-09-27 11:12:06,475 [INFO] train: learning_rate: 0.0003 2018-09-27 11:12:06,475 [INFO] train: loading_threads: 4 2018-09-27 11:12:06,475 [INFO] train: loss: batch_hard 2018-09-27 11:12:06,476 [INFO] train: margin: soft 2018-09-27 11:12:06,476 [INFO] train: metric: euclidean 2018-09-27 11:12:06,476 [INFO] train: model_name: resnet_v1_50 2018-09-27 11:12:06,476 [INFO] train: net_input_height: 64 2018-09-27 11:12:06,476 [INFO] train: net_input_width: 64 2018-09-27 11:12:06,476 [INFO] train: pre_crop_height: 64 2018-09-27 11:12:06,476 [INFO] train: pre_crop_width: 64 2018-09-27 11:12:06,476 [INFO] train: resume: False 2018-09-27 11:12:06,476 [INFO] train: train_iterations: 250000 2018-09-27 11:12:06,476 [INFO] train: train_set: F:/projector/GestureClassification/data/img/20180919/triplet_data/gesture_train.csv 2018-09-27 11:12:07,403 [INFO] tensorflow: Scale of 0 disables regularizer. 2018-09-27 11:12:07,403 [INFO] tensorflow: Scale of 0 disables regularizer. 2018-09-27 11:12:08,569 [WARNING] tensorflow: From F:\projector\GestureClassification\TripletBasedGestureRecognition\triplet-reid\nets\resnet_v1.py:219: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead 2018-09-27 11:12:08,569 [WARNING] tensorflow: From F:\projector\GestureClassification\TripletBasedGestureRecognition\triplet-reid\nets\resnet_v1.py:219: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\ops\gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " 2018-09-27 11:12:11.533610: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 2018-09-27 11:12:11.936193: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1344] Found device 0 with properties: name: GeForce GTX 1060 5GB major: 6 minor: 1 memoryClockRate(GHz): 1.7085 pciBusID: 0000:01:00.0 totalMemory: 5.00GiB freeMemory: 4.12GiB 2018-09-27 11:12:11.936710: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1423] Adding visible gpu devices: 0 2018-09-27 11:12:14.388590: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-09-27 11:12:14.388811: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:917] 0 2018-09-27 11:12:14.388948: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:930] 0: N 2018-09-27 11:12:14.415769: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3871 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 5GB, pci bus id: 0000:01:00.0, compute capability: 6.1) 2018-09-27 11:12:16.275624: I T:\src\github\tensorflow\tensorflow\core\kernels\cuda_solvers.cc:159] Creating CudaSolver handles for stream 000001A50E54E080 2018-09-27 11:12:20,572 [INFO] tensorflow: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/checkpoint-0 is not in all_model_checkpoint_paths. Manually adding it. 2018-09-27 11:12:20,572 [INFO] tensorflow: F:/projector/GestureClassification/TripletBasedGestureRecognition/experiment_root/20180926/checkpoint-0 is not in all_model_checkpoint_paths. Manually adding it. 2018-09-27 11:12:23,207 [INFO] train: Starting training from iteration 0.
Traceback (most recent call last): File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call return fn(*args) File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\client\session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "D:\Program Files\Python3.5\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?], [?]], output_types=[DT_FLOAT, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 439, in
Caused by op 'IteratorGetNext', defined at:
File "F:/projector/GestureClassification/TripletBasedGestureRecognition/triplet-reid/train.py", line 439, in
OutOfRangeError (see above for traceback): End of sequence [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,64,64,3], [?], [?]], output_types=[DT_FLOAT, DT_STRING, DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Process finished with exit code 1
I just found the reason. I have only 7 classes or persons in my dataset but I set batch_P as 8.
# Constrain the dataset size to a multiple of the batch-size, so that
# we don't get overlap at the end of each epoch.
dataset = dataset.take((len(unique_pids) // args.batch_p) * args.batch_p)
this step just take(0) as a result and the iteration of data will end at the first iteration then which raise the error mentioned.
It's a silly mistake but I suggest to add a if-else statement to notice this condition
Thanks for updating with the reason. Indeed we could add code catching this mistake, I'd happily accept a PR doing so!
you do a good job
@muxizju Just came across this as I also have few classes. My question is what happens with the rest of the classes if I say I have 7 classes and Batch_P is 4. What happens with the other 3 remainder classes. Do they get reiterated into the future batches or just ignored?
I just found the reason. I have only 7 classes or persons in my dataset but I set batch_P as 8.
# Constrain the dataset size to a multiple of the batch-size, so that # we don't get overlap at the end of each epoch. dataset = dataset.take((len(unique_pids) // args.batch_p) * args.batch_p)
this step just take(0) as a result and the iteration of data will end at the first iteration then which raise the error mentioned.
It's a silly mistake but I suggest to add a if-else statement to notice this condition