yolov7
yolov7 copied to clipboard
Problems with obj_loss
Hello all,
I've done three trainings now and each one is bad because an overfitting is visible at obj_loss. The first one I did from scratch (orange), the second one with pre-trained weights (blue) and the last one with the Transfer Learning technique (red). cfg is always YOLOv7.yaml (for the Transfer Learning I only adjusted the number of classes -> 3)
The dataset consists of 12000 images and the split is 65% (train), 20% (val) and 15% (test). It should be possible to detect three object classes. I did not change the hyperparameters. Does anyone maybe have a tip what I could change?
I was thinking more of increasing the data augmentation, but I'm not sure if that will help.
im also facing the same issue, i trained on custom data with 46 class. model is giving lots of false predictions
@akashAD98 How many images are in the train folder?
@DP1701 more tan 2 lakh, each class has almost 2000 instances , from your experiments scratch & pre trained model doing well
I am having the same issue and I suspect that it is due to missing annotations in the validation data. If an object is present in a validation image and the model predicts a bounding box for this object but there is no annotation for it, the prediction will result in an increase of loss. This would explain why my model appears to be doing better every epoch while the validation obj_loss
keeps increasing.
However, I am not sure if this is what obj_loss
could be representing. Does anyone know if this could be what is happening here?
@WikkAI Maybe it could be this, but I am absolutely not sure. With the same data set I could not observe the behavior with an older YOLO version.
@WongKinYiu Do you perhaps have an explanation for us?
https://github.com/WongKinYiu/yolov7/issues/243
I meet the same issue, validation set obj_loss is gradually increase. how can I solve it? why.
I'm having the same issue. Validation metrics look great but validation obj_loss increases right from the start.
![image](https://user-images.githubusercontent.com/11022568/184412274-55d8c171-b222-4f93-afe4-179ee11d4ba6.png)
![image](https://user-images.githubusercontent.com/11022568/184412313-054ec65a-5492-4d26-9c28-bc7015fc1d7b.png)
Has anyone figured out what may be causing the problem?
After running into this problem again, I set out to locate the cause. We observe that the validation obj_loss
is increasing instead of decreasing. This problem is also mentioned in https://github.com/WongKinYiu/yolov7/issues/120 https://github.com/WongKinYiu/yolov7/issues/243 https://github.com/WongKinYiu/yolov7/issues/665 https://github.com/WongKinYiu/yolov7/issues/1241. As some (@DP1701) have mentioned, the obj_loss
behaviour does not appear to be solely caused by the data as other yolo's do not show this behaviour on the same dataset. I checked some changes in yolov7's code vs yolov5's code to see what could be responsible.
In yolov7, the training loss automatically uses ComputeLossOTA
which is based on the OTA paper, while the validation loss uses the ComputeLoss
function which is very similar to the loss used in the yolov5 repo. I suspected that this might be the cause of the obj_loss
behaviour. To confirm this I ran two trainings with identical hyperparameters, datasets and seeds: yolov7-lowtide
which uses the ComputeLoss
function for both training and validation loss and yolov7-riptide
which uses ComputeLossOTA
for training and ComputeLoss
for validation loss.
As can be seen in the image below, yolov7-riptide
shows the problematic obj_loss
as expected. However, the training without ComputeLossOTA
does not show this behaviour and shows a steady decline of both training and validation loss.
However, even though the obj_loss
is exhibiting this weird behaviour, the ComputeLossOTA
run (riptide) does perform better on the evaluation metrics. Even though it has only trained for half the epochs of the other training up to this point. Below the images for mAP
and precision
and recall
.
I suspect that the mismatch in loss functions between ComputeLossOTA
in train and ComputeLoss
in validation is making it appear as though the run is severely overfitting. However, I still cannot explain why the validation obj_loss
is actually increasing while all evaluation metrics are improving.
@WikkAI @WongKinYiu @AlexeyAB any update on this issue?
I tried it on yolov7 and it does seem to have something to do with ComputeLossOTA