YOLOv6 icon indicating copy to clipboard operation
YOLOv6 copied to clipboard

低精度

Open sssssshf opened this issue 2 years ago • 8 comments

为啥训练到了60epoch时候,还是这个精度啊 ,降低flops至65M 左右

Evaluating mAP by pycocotools. Saving ./runs/train/yolov6_face_66M_0705/predictions.json... loading annotations into memory... Done (t=0.11s) creating index... index created! Loading and preparing results... DONE (t=2.71s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=36.83s). Accumulating evaluation results... DONE (t=5.91s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.001 Epoch: 60 | [email protected]: 0.00012148454109214603 | [email protected]:0.95: 1.989929560401417e-05

sssssshf avatar Jul 07 '22 01:07 sssssshf

你这明显有问题啊, 怎么都不可能这么差啊

d5423197 avatar Jul 07 '22 02:07 d5423197

你这明显有问题啊,怎么可能这么差啊

问题158 也是这个问题。我也暂时还没找到问题在哪。按道理来说数据格式一致之后 就可以训起来的。

sssssshf avatar Jul 07 '22 02:07 sssssshf

我训练都挺正常的, 就是训练到最后的mAP不如YOLOV5. 你可以可视化一下你的label, 也可以检查一下你的label有没有跟图片尺寸normalize.

d5423197 avatar Jul 07 '22 02:07 d5423197

我都挺正常的,就是训练到最后的mAP不如YOLOV5。你可以查看一下你的标签,也可以查看一下你的标签有没有跟图片尺寸归一化。 我的人脸数据集 之前是yolov5face的 也是正常的训练,推理也没问题。。标签是没问题的,检查过了。。你的是什么数据集,用的什么配置文件?

sssssshf avatar Jul 07 '22 02:07 sssssshf

配置文件用的都是default的, 没改变.

d5423197 avatar Jul 07 '22 03:07 d5423197

配置文件用的都是默认的,没改变。

我是人脸数据,网络改小了。可能有些超参数没适配吧 多进行几次实验找找问题

sssssshf avatar Jul 07 '22 03:07 sssssshf

那你训练后面几百个epoch也是这么低m吗

d5423197 avatar Jul 07 '22 03:07 d5423197

那你训练后面几百个epoch也是这么低m吗

报错了后面。。保存模型的时候吧可能是

ERROR in training loop or eval/save model.

Training completed in 18.118 hours. Traceback (most recent call last): File "tools/train_face.py", line 86, in main(args) File "tools/train_face.py", line 76, in main trainer.train() File "/home1/code/yolov6_face/yolov6/core/engine.py", line 62, in train self.train_in_loop() File "/home1/code/yolov6_face/yolov6/core/engine.py", line 75, in train_in_loop self.train_in_steps() File "/home1/code/yolov6_face/yolov6/core/engine.py", line 96, in train_in_steps self.scaler.scale(total_loss).backward() File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Unable to find a valid cuDNN algorithm to run convolution 65/399 1.141 0.425 1.78 0.4827: 37%|##https://github.com/meituan/YOLOv6/pull/7 | 205/554 [09:42<16:32, 2.84s/it]

sssssshf avatar Jul 07 '22 03:07 sssssshf