yolov7
yolov7 copied to clipboard
Validation performance during training differs from after training
Hey,
Basically, after I have fine-tuned a model and it runs the final performance check up, the validations metrics report something like 0.60mAP for my custom dataset, which seem reasonable.
Then when I evaluate performance on the same validation set as used during the training, I suddenly get perfect precision-recall tradeoff, and mAP close to 0.90. All of the correct bounding boxes seem to have confidence 1.0, and IoU close to 1.0, with other bounding boxes giving a lot of False-Positives, but with low confidence.
What could cause this difference? As far as I understand, the validation set during training is only used for scheduling learning rate, and thus the model shouldn't use the data in training. So I could evaluate the performance on it.
I'm using following commands to produce this:
python yolov7/train.py --weights model/yolov7_training.pt --data data/custom.yaml --cfg yolov7/cfg/training/yolov7.yaml --multi-scale --batch-size 8 --img 640 640 --epochs 25 --cache-images --hyp data/hyp.scratch.custom.yaml
yolov7/test.py --data data/custom.yaml --img 640 --batch 24 --conf 0.001 --iou 0.65 --device 0 --weights model/yolov7_finetuned.pt --save-txt --save-hybrid --save-json --save-conf --augment --task val
Hey, bumping this as it's really relevant for me and the question seems simple, the question is:
how is validation set used during training?
There are two different part, the first one is that val in training time uses iou thresh 0.7, the second one is that padding parameter is calculated by batch input.
I'm having the same issue working on YOLOv7 for a face detection task. Here is the reported performance on the validation set during training:
Epoch gpu_mem box obj cls total labels img_size
319/319 20G 0.04121 0.003993 0 0.0452 61 640: 100% 187/187 [01:13<00:00, 2.54it/s]
Class Images Labels P R [email protected] [email protected]:.95: 100% 24/24 [00:06<00:00, 3.72it/s]
all 1028 1854 0.85 0.542 0.646 0.325
and here is the result of evaluating the validation set using test.py
:
!python test.py --data data/ufdd.yaml --task val --single-cls --verbose --save-hybrid --save-conf --batch 32 --device 0 --weights runs/train/yolov7ufdd7/weights/best.pt --name yolov7ufdd7_val
val: Scanning 'val-ufdd/val.cache' images and labels... 1028 found, 0 missing, 553 empty, 0 corrupted: 100% 1028/1028 [00:00<?, ?it/s]
Class Images Labels P R [email protected] [email protected]:.95: 100% 33/33 [04:31<00:00, 8.24s/it]
all 1028 1854 1 1 0.995 0.995
As you can see, the results are near perfect and not at all realistic, not to mention different from the ones reported during training. Any idea why this happens?
There are two different part, the first one is that val in training time uses iou thresh 0.7, the second one is that padding parameter is calculated by batch input.
Setting --iou-thres
to 0.7 while running test.py
didn't change the result. The results are almost perfect, everything is either 1 or 0.9999, which doesn't make any sense.
Hi, I have the exact same problem. Did you get it solved?