yolov3-tf2 icon indicating copy to clipboard operation
yolov3-tf2 copied to clipboard

python train.py Error during training?why?

Open jiangxinufo opened this issue 3 years ago • 1 comments

(yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2>python train.py --dataset ./data/voc2007_train_stone.tfrecord --val_dataset ./data/voc2007_val_stone.tfrecord --classes ./data/stone.names --num_classes 1 --mode fit --transfer darknet --batch_size 4 --epochs 20 --weights ./checkpoints/yolov3.tf --weights_num_classes 80 2021-04-24 13:50:14.338255: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions tha t this TensorFlow binary was not compiled to use: AVX Epoch 1/20 2021-04-24 13:50:57.964647: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecuto r::StartAbort Invalid argument: Paddings must be non-negative: 0 -12 [[{{node Pad}}]] [[IteratorGetNext]] 2021-04-24 13:50:57.975653: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started. 1/Unknown - 13s 13s/stepWARNING:tensorflow:Reduce LR on plateau conditioned on metric val_loss which is not available. Available metrics are: lr W0424 13:50:57.973832 2388 callbacks.py:1934] Reduce LR on plateau conditioned on metric val_loss which is not a vailable. Available metrics are: lr WARNING:tensorflow:Early stopping conditioned on metric val_loss which is not available. Available metrics are: W0424 13:50:57.973832 2388 callbacks.py:1286] Early stopping conditioned on metric val_loss which is not availab le. Available metrics are:

Epoch 00001: saving model to checkpoints/yolov3_train_1.tf 1/Unknown - 17s 17s/stepTraceback (most recent call last): File "train.py", line 195, in app.run(main) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 303, in run _run_main(main, args) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "train.py", line 190, in main validation_data=val_dataset) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py ", line 819, in fit use_multiprocessing=use_multiprocessing) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2 .py", line 342, in fit total_epochs=epochs) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2 .py", line 128, in run_one_epoch batch_outs = execution_function(iterator) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2 _utils.py", line 98, in execution_function distributed_function(input_fn)) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in call result = self._call(*args, **kwds) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 632, in _call return self._stateless_fn(*args, **kwds) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 2363, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1611, in _filtered_call self.captured_inputs) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call ctx=ctx) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -12 [[{{node Pad}}]] [[IteratorGetNext]] [Op:__inference_distributed_function_47459]

Function call stack: distributed_function

WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8 W0424 13:51:03.144867 2388 util.py:144] Unresolved object in checkpoint: (root).layer-8 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-9 W0424 13:51:03.144867 2388 util.py:144] Unresolved object in checkpoint: (root).layer-9 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-10 W0424 13:51:03.144867 2388 util.py:144] Unresolved object in checkpoint: (root).layer-10 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-11 W0424 13:51:03.144867 2388 util.py:144] Unresolved object in checkpoint: (root).layer-11 WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status objec t, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to m ake the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details. W0424 13:51:03.144867 2388 util.py:152] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Mo del.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.

(yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2> (yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2> (yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2>python train.py --dataset ./dat a/voc2007_train_stone.tfrecord --val_dataset ./data/voc2007_val_stone.tfrecord --classes ./data/stone.names --num_classes 1 --mode fit --transfer dark net --batch_size 4 --epochs 20 --weights ./checkpoints/yolov3.tf --weights_num_classes 80 2021-04-24 14:16:12.964344: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not co mpiled to use: AVX Traceback (most recent call last): File "train.py", line 197, in app.run(main) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 303, in run _run_main(main, args) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 251, in run_main sys.exit(main(argv)) File "train.py", line 189, in main step_per_epoch=x.shape[0]//Batchsize, NameError: name 'x' is not defined WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8 W0424 14:16:31.858063 5304 util.py:144] Unresolved object in checkpoint: (root).layer-8 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-9 W0424 14:16:31.859061 5304 util.py:144] Unresolved object in checkpoint: (root).layer-9 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-10 W0424 14:16:31.859061 5304 util.py:144] Unresolved object in checkpoint: (root).layer-10 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-11 W0424 14:16:31.859061 5304 util.py:144] Unresolved object in checkpoint: (root).layer-11 WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were us ed. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details. W0424 14:16:31.860061 5304 util.py:152] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all check pointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mec hanics for details.

(yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2>python train.py --dataset ./data/voc2007_train_stone.tfrecord --val_dataset ./data/voc2007_val_s tone.tfrecord --classes ./data/stone.names --num_classes 1 --mode fit --transfer darknet --batch_size 4 --epochs 20 --weights ./checkpoints/yolov3.tf --weights_num_classes 80 2021-04-24 14:18:51.426349: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not co mpiled to use: AVX Epoch 1/20 2021-04-24 14:19:34.278458: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: Pad dings must be non-negative: 0 -12 [[{{node Pad}}]] [[IteratorGetNext]] 2021-04-24 14:19:34.290370: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started. 1/Unknown - 13s 13s/stepWARNING:tensorflow:Reduce LR on plateau conditioned on metric val_loss which is not available. Available metrics are: lr W0424 14:19:34.282967 1632 callbacks.py:1934] Reduce LR on plateau conditioned on metric val_loss which is not available. Available metrics are: lr

WARNING:tensorflow:Early stopping conditioned on metric val_loss which is not available. Available metrics are: W0424 14:19:34.282967 1632 callbacks.py:1286] Early stopping conditioned on metric val_loss which is not available. Available metrics are:

Epoch 00001: saving model to checkpoints/yolov3_train_1.tf 1/Unknown - 17s 17s/stepTraceback (most recent call last): File "train.py", line 196, in app.run(main) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 303, in run _run_main(main, args) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "train.py", line 191, in main validation_data=val_dataset) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 819, in fit use_multiprocessing=use_multiprocessing) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 342, in fit total_epochs=epochs) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 128, in run_one_epoch batch_outs = execution_function(iterator) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 98, in execution_f unction distributed_function(input_fn)) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in call result = self._call(*args, **kwds) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 632, in _call return self._stateless_fn(*args, **kwds) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 2363, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1611, in _filtered_call self.captured_inputs) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call ctx=ctx) File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -12 [[{{node Pad}}]] [[IteratorGetNext]] [Op:__inference_distributed_function_47459]

Function call stack: distributed_function

WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8 W0424 14:19:38.904387 1632 util.py:144] Unresolved object in checkpoint: (root).layer-8 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-9 W0424 14:19:38.904387 1632 util.py:144] Unresolved object in checkpoint: (root).layer-9 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-10 W0424 14:19:38.904387 1632 util.py:144] Unresolved object in checkpoint: (root).layer-10 WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-11 W0424 14:19:38.904387 1632 util.py:144] Unresolved object in checkpoint: (root).layer-11 WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were us ed. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details. W0424 14:19:38.904387 1632 util.py:152] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all check pointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_ partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mec hanics for details.

jiangxinufo avatar Apr 24 '21 06:04 jiangxinufo

我也是这个问题,请问你是怎么解决的

ycgr avatar Apr 18 '22 03:04 ycgr