tensorflow-yolov3
tensorflow-yolov3 copied to clipboard
运行 train.py 进行到Restoring weights时segmentation fault
python3 train.py
2019-08-05 06:48:09.794696: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-05 06:48:10.439188: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-05 06:48:10.439527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties: name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62 pciBusID: 0000:03:00.0 totalMemory: 7.79GiB freeMemory: 7.69GiB
2019-08-05 06:48:10.439541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-08-05 06:48:10.708758: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-05 06:48:10.708805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 2019-08-05 06:48:10.708812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N 2019-08-05 06:48:10.708890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7402 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:03:00.0, compute capability: 7.5)
=> Restoring weights from: ./checkpoint/yolov3_coco_demo.ckpt ... 0%| | 0/16551 [00:00<?, ?it/s]
Segmentation fault (core dumped)
按照README安装了requirements.txt里的依赖
下载了VOC 数据集,修改了config.py
voc_annotation.py和convert_weight.py(--train_from_coco)运行正常
运行train.py时段错误,有人遇到过相似的问题吗
好像是内存溢出了,有什么方法可以解决吗,比如BatchSize设置多少合适?
我把train 的batch size 从6改成了3 成功训练了。 仓主设置了50个epoch 每个生成近1g的权重数据,电脑有点holo不住
我也遇到这种错误,同样2070显卡,把batch 改成1也错,不知道怎么办
遇到了同样的问题,显卡RTX5000,已排除显存不足、数据集缺陷与classes_name有误的可能,但问题依然无法解决,不能进行任何训练,在第一个batch执行时即报段错误。 问题完全无法排查,放弃了。 环境Python 3.5 Ubuntu 16.04(Docker)