tf_classification icon indicating copy to clipboard operation
tf_classification copied to clipboard

Error at warm up training

Open KevinRodrigues05 opened this issue 6 years ago • 0 comments

$ CUDA_VISIBLE_DEVICES=0 python train.py
> --tfrecords $DATASET_DIR/train*
> --logdir $EXPERIMENT_DIR/logdir/finetune
> --config $EXPERIMENT_DIR/config_train.yaml
> --pretrained_model $IMAGENET_PRETRAINED_MODEL
> --trainable_scopes InceptionV3/Logits InceptionV3/AuxLogits
> --checkpoint_exclude_scopes InceptionV3/Logits InceptionV3/AuxLogits
> --learning_rate_decay_type fixed
> --lr 0.01 WARNING:tensorflow:From train.py:268: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step WARNING:tensorflow:From /home/shashwat/.local/lib/python2.7/site-packages/tensorflow/python/training/input.py:187: init (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. WARNING:tensorflow:From /home/shashwat/.local/lib/python2.7/site-packages/tensorflow/python/training/input.py:187: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version. Instructions for updating: To construct input pipelines, use the tf.data module. INFO:tensorflow:Restoring variables from /home/shashwat/kevin/ml-2/inception_v3.ckpt WARNING:tensorflow:From /home/shashwat/.local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py:737: init (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.MonitoredTrainingSession 2018-10-24 18:32:36.878151: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA INFO:tensorflow:Restoring parameters from /home/shashwat/kevin/ml-2/inception_v3.ckpt INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting Session. INFO:tensorflow:Saving checkpoint to path /home/shashwat/kevin/ml-2/tfrecords/cub_image_experiment/logdir/finetune/model.ckpt INFO:tensorflow:Starting Queues. INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:Recording summary at step 0. INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.DataLossError'>, corrupted record at 0 [[{{node inputs/ReaderReadV2}} = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](inputs/TFRecordReaderV2, inputs/input_producer)]] INFO:tensorflow:Finished training! Saving model to disk. Traceback (most recent call last): File "train.py", line 504, in main() File "train.py", line 500, in main read_images=args.read_images File "train.py", line 404, in train log_every_n_steps = cfg.LOG_EVERY_N_STEPS File "/home/shashwat/.local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 785, in train ignore_live_threads=ignore_live_threads) File "/home/shashwat/.local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 833, in stop ignore_live_threads=ignore_live_threads) File "/home/shashwat/.local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/home/shashwat/.local/lib/python2.7/site-packages/tensorflow/python/training/queue_runner_impl.py", line 257, in _run enqueue_callable() File "/home/shashwat/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1215, in _single_operation_run self._call_tf_sessionrun(None, {}, [], target_list, None) File "/home/shashwat/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.DataLossError: corrupted record at 0 [[{{node inputs/ReaderReadV2}} = ReaderReadV2[_device="/job:localhost/replica:0/task:0/device:CPU:0"](inputs/TFRecordReaderV2, inputs/input_producer)]]

please help me solve this error

KevinRodrigues05 avatar Oct 24 '18 13:10 KevinRodrigues05