pytorch-YOLOv4 icon indicating copy to clipboard operation
pytorch-YOLOv4 copied to clipboard

high object loss during training

Open BELZHANG opened this issue 4 years ago • 6 comments

I trained for a single class object detection task, the obj loss stop decrease around 2000+ at after 30 epochs

2020-07-02 02:14:47,659 train.py[line:388] DEBUG: Train step_54720: loss : 2443.021484375,loss xy : 10.428301811218262,loss wh : 0.4252524971961975,loss obj : 2431.867431640625,loss cls : 0.3006296753883362,loss l2 : 149.89588928222656,lr : 0.0001 2020-07-02 02:16:22,901 train.py[line:388] DEBUG: Train step_55040: loss : 2481.62939453125,loss xy : 19.018123626708984,loss wh : 3.39251708984375,loss obj : 2458.537841796875,loss cls : 0.6808265447616577,loss l2 : 160.90220642089844,lr : 0.0001 2020-07-02 02:17:58,271 train.py[line:388] DEBUG: Train step_55360: loss : 2519.954345703125,loss xy : 15.710888862609863,loss wh : 8.115646362304688,loss obj : 2495.658447265625,loss cls : 0.4693739712238312,loss l2 : 175.01007080078125,lr : 0.0001 2020-07-02 02:19:33,424 train.py[line:388] DEBUG: Train step_55680: loss : 2428.70263671875,loss xy : 9.945980072021484,loss wh : 0.7127013206481934,loss obj : 2417.80322265625,loss cls : 0.24069327116012573,loss l2 : 149.42962646484375,lr : 0.0001 2020-07-02 02:21:08,341 train.py[line:388] DEBUG: Train step_56000: loss : 2504.226806640625,loss xy : 20.75212860107422,loss wh : 5.360098361968994,loss obj : 2477.334716796875,loss cls : 0.7798290252685547,loss l2 : 169.95872497558594,lr : 0.0001 2020-07-02 02:22:43,993 train.py[line:388] DEBUG: Train step_56320: loss : 2441.465087890625,loss xy : 24.657085418701172,loss wh : 2.177201509475708,loss obj : 2413.60498046875,loss cls : 1.0258153676986694,loss l2 : 155.5187225341797,lr : 0.0001 2020-07-02 02:23:31,693 train.py[line:399] INFO: Checkpoint 32 saved !

What could be the possible reasons?

Thanks

BELZHANG avatar Jul 02 '20 15:07 BELZHANG

I also face this problem, any solutions?

DongChen06 avatar Jul 03 '20 04:07 DongChen06

How many pictures are there in your dataset.

Tianxiaomo avatar Jul 03 '20 05:07 Tianxiaomo

@Tianxiaomo I use 144 training images and 16 testing images. I set the parameters like

Cfg.batch = 16
Cfg.subdivisions = 8

DongChen06 avatar Jul 03 '20 13:07 DongChen06

Iterate the model enough times to achieve good results, and you can review the validation of each epoch during training.

Tianxiaomo avatar Jul 03 '20 13:07 Tianxiaomo

high loss during my training, could you help me? Training size: 100000 Dataset classes: 1 image

Iterate the model enough times to achieve good results, and you can review the validation of each epoch during training.

shuangzixing avatar Jul 11 '20 02:07 shuangzixing

I have the same problem, have you solved?

Pigdrum avatar Dec 20 '21 08:12 Pigdrum