target[i][0] and target[i][1] are always zero for i=1,2 in train_step (train.py)
Thank you for providing this great/useful code!
Now I try to make the model learn COCO data with the given weights, "yolov4.weights" and my ultimate goal is to do pruning/quantization aware training. My modification is:
- compile "val2017.txt" to load my COCO validation images.
- download the weights, "yolov4.weights", and put it at "weight" directory.
To verify whether train.py works well, I executed
python train.py --weights ./weight/yolov4.weights
Then, the following result was displayed:
=> STEP 1/123800 lr: 0.001000 giou_loss: 11.34 conf_loss: 79.89 prob_loss: 10.53 total_loss: 101.77
=> STEP 2/123800 lr: 0.000000 giou_loss: 7.95 conf_loss: 46.94 prob_loss: 29.53 total_loss: 84.42
=> STEP 3/123800 lr: 0.000001 giou_loss: 9.74 conf_loss: 52.53 prob_loss: 22.90 total_loss: 85.17
=> STEP 4/123800 lr: 0.000001 giou_loss: 5.73 conf_loss: 45.76 prob_loss: 9.78 total_loss: 61.27
=> STEP 5/123800 lr: 0.000001 giou_loss: 12.86 conf_loss: 63.69 prob_loss: 18.11 total_loss: 94.65
=> STEP 6/123800 lr: 0.000001 giou_loss: 4.04 conf_loss: 29.02 prob_loss: 12.42 total_loss: 45.48
=> STEP 7/123800 lr: 0.000001 giou_loss: 12.05 conf_loss: 76.52 prob_loss: 19.87 total_loss: 108.44
=> STEP 8/123800 lr: 0.000002 giou_loss: 2.67 conf_loss: 8.95 prob_loss: 1.38 total_loss: 13.00
=> STEP 9/123800 lr: 0.000002 giou_loss: 24.18 conf_loss: 154.82 prob_loss: 50.40 total_loss: 229.40
=> STEP 10/123800 lr: 0.000002 giou_loss: 7.75 conf_loss: 38.06 prob_loss: 5.73 total_loss: 51.54
=> STEP 11/123800 lr: 0.000002 giou_loss: 19.06 conf_loss: 92.25 prob_loss: 27.02 total_loss: 138.33
=> STEP 12/123800 lr: 0.000002 giou_loss: 13.39 conf_loss: 74.36 prob_loss: 52.83 total_loss: 140.58
=> STEP 13/123800 lr: 0.000003 giou_loss: 6.14 conf_loss: 43.05 prob_loss: 17.16 total_loss: 66.35
=> STEP 14/123800 lr: 0.000003 giou_loss: 10.43 conf_loss: 44.24 prob_loss: 16.38 total_loss: 71.04
=> STEP 15/123800 lr: 0.000003 giou_loss: 10.52 conf_loss: 62.30 prob_loss: 23.42 total_loss: 96.24
=> STEP 16/123800 lr: 0.000003 giou_loss: 20.54 conf_loss: 83.18 prob_loss: 38.29 total_loss: 142.01
=> STEP 17/123800 lr: 0.000003 giou_loss: 6.66 conf_loss: 40.61 prob_loss: 11.53 total_loss: 58.81
=> STEP 18/123800 lr: 0.000004 giou_loss: 20.00 conf_loss: 94.56 prob_loss: 39.48 total_loss: 154.03
=> STEP 19/123800 lr: 0.000004 giou_loss: 19.43 conf_loss: 84.69 prob_loss: 22.74 total_loss: 126.87
=> STEP 20/123800 lr: 0.000004 giou_loss: 21.21 conf_loss: 119.39 prob_loss: 49.07 total_loss: 189.66
I thought the total_loss is too large even though I used the pre-trained model. To investigate the loss values, I inserted the following sentences at line 89 in train.py (inside "train_step" function)
print("{} : target[i][0]".format(i))
print(np.max(target[i][0]))
print("---------------------------------------------")
print(np.max(target[i][1]))
print("####################")
The results are as follows.
0 : target[i][0]
360.5
---------------------------------------------
360.5
####################
1 : target[i][0]
0.0
---------------------------------------------
0.0
####################
2 : target[i][0]
0.0
---------------------------------------------
0.0
####################
=> STEP 1/123800 lr: 0.001000 giou_loss: 17.50 conf_loss: 149.47 prob_loss: 36.95 total_loss: 203.93
0 : target[i][0]
391.5
---------------------------------------------
391.5
####################
1 : target[i][0]
0.0
---------------------------------------------
0.0
####################
2 : target[i][0]
0.0
---------------------------------------------
0.0
####################
=> STEP 2/123800 lr: 0.000000 giou_loss: 13.62 conf_loss: 71.56 prob_loss: 25.31 total_loss: 110.48
0 : target[i][0]
406.0
---------------------------------------------
406.0
####################
1 : target[i][0]
0.0
---------------------------------------------
0.0
####################
2 : target[i][0]
0.0
---------------------------------------------
0.0
####################
=> STEP 3/123800 lr: 0.000001 giou_loss: 18.30 conf_loss: 107.09 prob_loss: 43.55 total_loss: 168.93
0 : target[i][0]
387.5
---------------------------------------------
387.5
####################
1 : target[i][0]
0.0
---------------------------------------------
0.0
####################
2 : target[i][0]
0.0
---------------------------------------------
0.0
####################
I found the target[1][0] and target[2][0] always output zero. This seems to be a main reason why I got the large loss values. Further, I also found that the output-model has low mAP only with 3000 steps. (About 10.0 with Pycocotools)
I have no idea to fix this and I don't know whether this is a bug or due to my mistakes. If you have any ideas to resolve this issue, please tell me, thanks.
i have the same question , Have you solved it?
If anyone knows how the tensorflow model can be easily loaded into keras for pruning, would be very useful!