YOLOv6 icon indicating copy to clipboard operation
YOLOv6 copied to clipboard

ERROR in evaluate and save model. ERROR in training loop or eval/save model.

Open leoyahaha opened this issue 2 years ago • 7 comments

Training start...

 Epoch  iou_loss   l1_loss  obj_loss  cls_loss

0%| | 0/691 [00:00<?, ?it/s] /home/acus/anaconda3/envs/mengshan/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2228.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] 0/399 2.178 1.209 3.51 1.606: 100%|██████████| 691/691 [02:23<00:00,
Inferencing model in val datasets.: 100%|███████████████████| 87/87 [00:37<00:00, 2.32it/s]

Evaluating speed.

Evaluating mAP by pycocotools. Saving runs/train/exp6/predictions.json... loading annotations into memory... Done (t=0.04s) creating index... index created! Loading and preparing results... DONE (t=4.52s) creating index... index created! ERROR in evaluate and save model. ERROR in training loop or eval/save model.

Training completed in 0.053 hours. Traceback (most recent call last): File "tools/train.py", line 112, in main(args) File "tools/train.py", line 102, in main trainer.train() File "/media/acus/new_disk/meng/YOLOv6-main/yolov6/core/engine.py", line 75, in train self.train_in_loop() File "/media/acus/new_disk/meng/YOLOv6-main/yolov6/core/engine.py", line 94, in train_in_loop self.eval_and_save() File "/media/acus/new_disk/meng/YOLOv6-main/yolov6/core/engine.py", line 120, in eval_and_save self.eval_model() File "/media/acus/new_disk/meng/YOLOv6-main/yolov6/core/engine.py", line 139, in eval_model results = eval.run(self.data_dict, File "/home/acus/anaconda3/envs/mengshan/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/media/acus/new_disk/meng/YOLOv6-main/tools/eval.py", line 83, in run eval_result = val.eval_model(pred_result, model, dataloader, task) File "/media/acus/new_disk/meng/YOLOv6-main/yolov6/core/evaler.py", line 128, in eval_model cocoEval = COCOeval(anno, pred, 'bbox') File "/home/acus/anaconda3/envs/mengshan/lib/python3.8/site-packages/pycocotools/cocoeval.py", line 80, in init self.params.imgIds = sorted(cocoGt.getImgIds()) TypeError: '<' not supported between instances of 'str' and 'int'

After the training is completed, before the evaluation results, I am getting the following error. I would really appreciate it, to get some insights on how to go about this issue as soon as possible.

leoyahaha avatar Jul 05 '22 09:07 leoyahaha

I have an identical problem. On google colab it works perfectly, but on the local machine it throws this error.

JovanBosic avatar Jul 05 '22 10:07 JovanBosic

I could fix this issue by configuring dataset.yaml instead of coco.yaml. I have considered my dataset as a custom dataset, because I am considering only two classes among the coco dataset.

geekdreamer04 avatar Jul 05 '22 19:07 geekdreamer04

I solved the problem myself. First, I used my own dataset.I found that the training didn't make an error by presenting only part of the data, and then I gradually checked and found that some photos named pure numbers caused the error.The problem arises presumably because pure numbers are judged to be int rather than str.I solved the problem by deleting the images with purely numeric names, and changing the names should work as well

leoyahaha avatar Jul 06 '22 02:07 leoyahaha

I solved the problem myself. First, I used my own dataset.I found that the training didn't make an error by presenting only part of the data, and then I gradually checked and found that some photos named pure numbers caused the error.The problem arises presumably because pure numbers are judged to be int rather than str.I solved the problem by deleting the images with purely numeric names, and changing the names should work as well

leoyahaha avatar Jul 06 '22 02:07 leoyahaha

I also confirm that if the name of the image consists only of numbers like for example '12345311232.jpg' this problem occurs and the images must be removed from the dataset so that the algorithm can be trained and evaluated.

JovanBosic avatar Jul 06 '22 06:07 JovanBosic

我有这个错误

ERROR in training loop or eval/save model.

Training completed in 18.118 hours. Traceback (most recent call last): File "tools/train_face.py", line 86, in main(args) File "tools/train_face.py", line 76, in main trainer.train() File "/home1/code/yolov6_face/yolov6/core/engine.py", line 62, in train self.train_in_loop() File "/home1/code/yolov6_face/yolov6/core/engine.py", line 75, in train_in_loop self.train_in_steps() File "/home1/code/yolov6_face/yolov6/core/engine.py", line 96, in train_in_steps self.scaler.scale(total_loss).backward() File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: Unable to find a valid cuDNN algorithm to run convolution 65/399 1.141 0.425 1.78 0.4827: 37%|###7 | 205/554 [09:42<16:32, 2.84s/it]

sssssshf avatar Jul 07 '22 02:07 sssssshf

I solved the problem myself. First, I used my own dataset.I found that the training didn't make an error by presenting only part of the data, and then I gradually checked and found that some photos named pure numbers caused the error.The problem arises presumably because pure numbers are judged to be int rather than str.I solved the problem by deleting the images with purely numeric names, and changing the names should work as well

u are right, i modify cocoeval.py's 80th line: self.params.imgIds = sorted([str(i) for i in cocoGt.getImgIds()]) ,and then ,it's work.

dylanlb avatar Jul 08 '22 09:07 dylanlb

@dylanlb could you provide me with the modification you did I have the same problem.

abdulghani91 avatar Feb 02 '23 14:02 abdulghani91

yeah, I have solve the problem pycocotools unable to run : '<' not supported between instances of 'str' and 'int' that's been bothering me for two days, thank you very much.

I solved the problem myself. First, I used my own dataset.I found that the training didn't make an error by presenting only part of the data, and then I gradually checked and found that some photos named pure numbers caused the error.The problem arises presumably because pure numbers are judged to be int rather than str.I solved the problem by deleting the images with purely numeric names, and changing the names should work as well

u are right, i modify cocoeval.py's 80th line: self.params.imgIds = sorted([str(i) for i in cocoGt.getImgIds()]) ,and then ,it's work. yeah, I have solve the problem pycocotools unable to run : '<' not supported between instances of 'str' and 'int' that's been bothering me for two days, thank you very much.

xlgong avatar Mar 23 '23 07:03 xlgong

@dylanlb Hello, can you tell me what kind of file cocoeval.py is and where exactly did you find this line?

Egorundel avatar Sep 15 '23 06:09 Egorundel