K210_Yolo_framework
K210_Yolo_framework copied to clipboard
您好,我用自定义数据集训练出现了这个问题:ValueError: Empty training data.训练之前的步骤都可以,之前跑通了原始voc数据集的训练以及部署。自己做一个小项目分4类,数据集按照voc格式制作的,前边的步骤也都可以,一到训练这里就
(tf115) F:\1_work\K210\k210-for-yolo\yolo-for-trash_detection-k210>make train MODEL=yolo_mobilev1 DEPTHMUL=0.75 MAXEP=10 ILR=0.001 DATASET=voc CLSNUM=4 IAA=False BATCH=8
python ./keras_train.py
--train_set voc
--class_num 4
--pre_ckpt ""
--model_def yolo_mobilev1
--depth_multiplier 0.75
--augmenter False
--image_size 224 320
--output_size 7 10 14 20
--batch_size 8
--rand_seed 3
--max_nrof_epochs 10
--init_learning_rate 0.001
--learning_rate_decay_factor 0
--obj_weight 1
--noobj_weight 1
--wh_weight 1
--obj_thresh 0.7
--iou_thresh 0.5
--vaildation_split 0.05
--log_dir log
--is_prune False
--prune_initial_sparsity 0.5
--prune_final_sparsity 0.9
--prune_end_epoch 5
--prune_frequency 100
2020-08-13 16:31:51.021329: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
- https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
- https://github.com/tensorflow/addons
- https://github.com/tensorflow/io (for I/O related ops) If you depend on functionality not listed there, please file an issue.
2020-08-13 16:31:54.680652: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-08-13 16:31:54.693250: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-08-13 16:31:54.730848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2
pciBusID: 0000:01:00.0
2020-08-13 16:31:54.736948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-08-13 16:31:54.746169: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-08-13 16:31:54.753784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-08-13 16:31:54.760818: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-08-13 16:31:54.767979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-08-13 16:31:54.774382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-08-13 16:31:54.787883: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 16:31:54.792935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-13 16:31:55.371745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-13 16:31:55.376201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-08-13 16:31:55.379023: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-08-13 16:31:55.382793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4608 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
[34m[ INFO ][0m data augment is False
WARNING:tensorflow:From F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\data\util\random_seed.py:58: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2020-08-13 16:31:55.534760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2
pciBusID: 0000:01:00.0
2020-08-13 16:31:55.541031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-08-13 16:31:55.544708: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-08-13 16:31:55.547742: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-08-13 16:31:55.551895: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-08-13 16:31:55.555008: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-08-13 16:31:55.559177: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-08-13 16:31:55.562276: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 16:31:55.565958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-13 16:31:55.569220: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.2
pciBusID: 0000:01:00.0
2020-08-13 16:31:55.574059: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-08-13 16:31:55.578232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2020-08-13 16:31:55.581370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2020-08-13 16:31:55.585411: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2020-08-13 16:31:55.588457: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2020-08-13 16:31:55.592139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2020-08-13 16:31:55.595837: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 16:31:55.599397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-08-13 16:31:55.602502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-13 16:31:55.606356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-08-13 16:31:55.608364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-08-13 16:31:55.610806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4608 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5)
[34m[ INFO ][0m data augment is False
WARNING:tensorflow:From F:\software\Anaconda3\envs\tf115\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Train on 21 steps
Epoch 1/10
2020-08-13 16:32:22.746351: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-08-13 16:32:24.054899: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.
2020-08-13 16:32:24.128920: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
1/21 [>.............................] - ETA: 2:31 - loss: 792.4858 - l1_loss: 131.6574 - l2_loss: 660.4179 - l1_p: 0.0000e+00 - l1_r: 0.0000e+00 - l2_p: 5.8514e-04 - l2_r: 0.20002020-08-13 16:32:26.777512: I tensorflow/core/profiler/lib/profiler_session.cc:205] Profiler session started.
2020-08-13 16:32:26.787342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_100.dll
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.280863). Check your callbacks.
2/21 [=>............................] - ETA: 1:17 - loss: 817.4302 - l1_loss: 172.4517 - l2_loss: 644.5675 - l1_p: 0.0012 - l1_r: 0.2500 - l2_p: 0.0025 - l2_r: 0.5000 2020-08-13 16:32:27.322285: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 3504 kernel records, 316 memcpy records.
WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.444144). Check your callbacks.
3/21 [===>..........................] - ETA: 52s - loss: 745.5250 - l1_loss: 159.0245 - l2_loss: 586.0892 - l1_p: 8.5985e-04 - l1_r: 0.1429 - l2_p: 0.0023 - l2_r: 0.3636WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.280863). Check your callbacks.
4/21 [====>.........................] - ETA: 37s - loss: 695.2993 - l1_loss: 148.6290 - l2_loss: 546.2584 - l1_p: 0.0015 - l1_r: 0.1538 - l2_p: 0.0022 - l2_r: 0.3200 WARNING:tensorflow:Method (on_train_batch_end) is slow compared to the batch update (0.117583). Check your callbacks.
20/21 [===========================>..] - ETA: 0s - loss: 364.4713 - l1_loss: 68.8265 - l2_loss: 295.2221 - l1_p: 0.0012 - l1_r: 0.0385 - l2_p: 0.0020 - l2_r: 0.0563Traceback (most recent call last):
File "./keras_train.py", line 155, in
这个是什么问题,是数据集大小需要修改吗?还是路径问题?还是batch问题,我已经修改了batch到1也是这个问题
同样的问题,有人可以指点一下嘛?
自定义数据集的时候很容易出现数据太少,并且我之前的代码是对于测试集是直接取训练集中的一部分,可能导致他无法满足tf.dataset
里面的buffer出现这个问题,你可以在定义输入数据管道的地方将shuffle
、map
等操作的buffer改小一些。
我也遇到过同样问题。程序自动对数据集进行分割,分成训练集和测试集。这里的参数batch_size
和vaildation_split
直接影响训练集和测试集的大小。如果按照默认设置batch_size = 32
, vaildation_split = 0.05
则会出现测试集过小,训练中途失败的问题。
我是通过调整参数来解决问题的。例如将设置修改为batch_size = 16
, vaildation_split = 0.1
。
在tools/utils.py 文件中修改_create_dataset函数中的shuffle参数如下,并在train_fit函数中把validation_step 参数改为定值,亲测可以解决!!(把buffer大小改为数据集大小就可以了 !!)
def _create_dataset()
dataset = (tf.data.Dataset.from_generator(gen, (tf.framework_ops.dtypes.string, tf.float32), ([], [None, 5])). shuffle(self.train_total_data if is_training == True else self.test_total_data, rand_seed).repeat(). map(_parser_wrapper, tf.data.experimental.AUTOTUNE). batch(batch_size, True).prefetch(tf.data.experimental.AUTOTUNE))
return dataset