yolov7 Validation performance during training differs from after training

Hey,

Basically, after I have fine-tuned a model and it runs the final performance check up, the validations metrics report something like 0.60mAP for my custom dataset, which seem reasonable.

Then when I evaluate performance on the same validation set as used during the training, I suddenly get perfect precision-recall tradeoff, and mAP close to 0.90. All of the correct bounding boxes seem to have confidence 1.0, and IoU close to 1.0, with other bounding boxes giving a lot of False-Positives, but with low confidence.

What could cause this difference? As far as I understand, the validation set during training is only used for scheduling learning rate, and thus the model shouldn't use the data in training. So I could evaluate the performance on it.

I'm using following commands to produce this: python yolov7/train.py --weights model/yolov7_training.pt --data data/custom.yaml --cfg yolov7/cfg/training/yolov7.yaml --multi-scale --batch-size 8 --img 640 640 --epochs 25 --cache-images --hyp data/hyp.scratch.custom.yaml

yolov7/test.py --data data/custom.yaml --img 640 --batch 24 --conf 0.001 --iou 0.65 --device 0 --weights model/yolov7_finetuned.pt --save-txt --save-hybrid --save-json --save-conf --augment --task val

Jul 29 '22 11:07 TeemuSo

Hey, bumping this as it's really relevant for me and the question seems simple, the question is:

how is validation set used during training?

Aug 04 '22 09:08 TeemuSo

There are two different part, the first one is that val in training time uses iou thresh 0.7, the second one is that padding parameter is calculated by batch input.

Aug 05 '22 01:08 WongKinYiu

I'm having the same issue working on YOLOv7 for a face detection task. Here is the reported performance on the validation set during training:

Epoch   gpu_mem       box       obj       cls     total    labels  img_size
   319/319       20G   0.04121  0.003993         0    0.0452        61       640: 100% 187/187 [01:13<00:00,  2.54it/s]
               Class      Images      Labels           P           R      [email protected]  [email protected]:.95: 100% 24/24 [00:06<00:00,  3.72it/s]
                 all        1028        1854        0.85       0.542       0.646       0.325

and here is the result of evaluating the validation set using test.py:

!python test.py --data data/ufdd.yaml --task val --single-cls --verbose --save-hybrid --save-conf --batch 32 --device 0 --weights runs/train/yolov7ufdd7/weights/best.pt --name yolov7ufdd7_val

val: Scanning 'val-ufdd/val.cache' images and labels... 1028 found, 0 missing, 553 empty, 0 corrupted: 100% 1028/1028 [00:00<?, ?it/s]
               Class      Images      Labels           P           R      [email protected]  [email protected]:.95: 100% 33/33 [04:31<00:00,  8.24s/it]
                 all        1028        1854           1           1       0.995       0.995

As you can see, the results are near perfect and not at all realistic, not to mention different from the ones reported during training. Any idea why this happens?

Nov 26 '22 04:11 msamsami

There are two different part, the first one is that val in training time uses iou thresh 0.7, the second one is that padding parameter is calculated by batch input.

Setting --iou-thres to 0.7 while running test.py didn't change the result. The results are almost perfect, everything is either 1 or 0.9999, which doesn't make any sense.

Nov 26 '22 05:11 msamsami

Hi, I have the exact same problem. Did you get it solved?

Mar 06 '24 17:03 GonzalixX

yolov7 yolov7 copied to clipboard

Validation performance during training differs from after training

yolov7
yolov7 copied to clipboard