yolov7 icon indicating copy to clipboard operation
yolov7 copied to clipboard

Problems with obj_loss

Open DP1701 opened this issue 1 year ago • 7 comments

Hello all,

I've done three trainings now and each one is bad because an overfitting is visible at obj_loss. The first one I did from scratch (orange), the second one with pre-trained weights (blue) and the last one with the Transfer Learning technique (red). cfg is always YOLOv7.yaml (for the Transfer Learning I only adjusted the number of classes -> 3)

The dataset consists of 12000 images and the split is 65% (train), 20% (val) and 15% (test). It should be possible to detect three object classes. I did not change the hyperparameters. Does anyone maybe have a tip what I could change?

I was thinking more of increasing the data augmentation, but I'm not sure if that will help.

Image

DP1701 avatar Jul 27 '22 07:07 DP1701

im also facing the same issue, i trained on custom data with 46 class. model is giving lots of false predictions image

akashAD98 avatar Jul 29 '22 06:07 akashAD98

@akashAD98 How many images are in the train folder?

DP1701 avatar Jul 29 '22 06:07 DP1701

@DP1701 more tan 2 lakh, each class has almost 2000 instances , from your experiments scratch & pre trained model doing well

akashAD98 avatar Jul 29 '22 07:07 akashAD98

I am having the same issue and I suspect that it is due to missing annotations in the validation data. If an object is present in a validation image and the model predicts a bounding box for this object but there is no annotation for it, the prediction will result in an increase of loss. This would explain why my model appears to be doing better every epoch while the validation obj_loss keeps increasing.

However, I am not sure if this is what obj_loss could be representing. Does anyone know if this could be what is happening here?

WikkAI avatar Jul 29 '22 10:07 WikkAI

@WikkAI Maybe it could be this, but I am absolutely not sure. With the same data set I could not observe the behavior with an older YOLO version.

@WongKinYiu Do you perhaps have an explanation for us?

DP1701 avatar Jul 29 '22 12:07 DP1701

https://github.com/WongKinYiu/yolov7/issues/243

akashAD98 avatar Aug 01 '22 04:08 akashAD98

I meet the same issue, validation set obj_loss is gradually increase. how can I solve it? why.

KevenLee avatar Aug 01 '22 09:08 KevenLee

I'm having the same issue. Validation metrics look great but validation obj_loss increases right from the start.

image image

kaminetzky avatar Aug 12 '22 17:08 kaminetzky

Has anyone figured out what may be causing the problem?

DP1701 avatar Aug 17 '22 19:08 DP1701

After running into this problem again, I set out to locate the cause. We observe that the validation obj_loss is increasing instead of decreasing. This problem is also mentioned in https://github.com/WongKinYiu/yolov7/issues/120 https://github.com/WongKinYiu/yolov7/issues/243 https://github.com/WongKinYiu/yolov7/issues/665 https://github.com/WongKinYiu/yolov7/issues/1241. As some (@DP1701) have mentioned, the obj_loss behaviour does not appear to be solely caused by the data as other yolo's do not show this behaviour on the same dataset. I checked some changes in yolov7's code vs yolov5's code to see what could be responsible.

In yolov7, the training loss automatically uses ComputeLossOTA which is based on the OTA paper, while the validation loss uses the ComputeLoss function which is very similar to the loss used in the yolov5 repo. I suspected that this might be the cause of the obj_loss behaviour. To confirm this I ran two trainings with identical hyperparameters, datasets and seeds: yolov7-lowtide which uses the ComputeLoss function for both training and validation loss and yolov7-riptide which uses ComputeLossOTA for training and ComputeLoss for validation loss.

As can be seen in the image below, yolov7-riptide shows the problematic obj_loss as expected. However, the training without ComputeLossOTA does not show this behaviour and shows a steady decline of both training and validation loss. image

However, even though the obj_loss is exhibiting this weird behaviour, the ComputeLossOTA run (riptide) does perform better on the evaluation metrics. Even though it has only trained for half the epochs of the other training up to this point. Below the images for mAP and precision and recall. image image

I suspect that the mismatch in loss functions between ComputeLossOTA in train and ComputeLoss in validation is making it appear as though the run is severely overfitting. However, I still cannot explain why the validation obj_loss is actually increasing while all evaluation metrics are improving.

WikkAI avatar Jan 02 '23 12:01 WikkAI

@WikkAI @WongKinYiu @AlexeyAB any update on this issue?

akashAD98 avatar Feb 01 '23 05:02 akashAD98

I tried it on yolov7 and it does seem to have something to do with ComputeLossOTA

yyxou2022 avatar Feb 07 '23 15:02 yyxou2022