tensorflow-yolov4-tflite icon indicating copy to clipboard operation
tensorflow-yolov4-tflite copied to clipboard

target[i][0] and target[i][1] are always zero for i=1,2 in train_step (train.py)

Open HirofumiTsuda opened this issue 5 years ago • 2 comments

Thank you for providing this great/useful code!

Now I try to make the model learn COCO data with the given weights, "yolov4.weights" and my ultimate goal is to do pruning/quantization aware training. My modification is:

  • compile "val2017.txt" to load my COCO validation images.
  • download the weights, "yolov4.weights", and put it at "weight" directory.

To verify whether train.py works well, I executed

python train.py --weights ./weight/yolov4.weights

Then, the following result was displayed:

=> STEP    1/123800   lr: 0.001000   giou_loss: 11.34   conf_loss: 79.89   prob_loss: 10.53   total_loss: 101.77
=> STEP    2/123800   lr: 0.000000   giou_loss: 7.95   conf_loss: 46.94   prob_loss: 29.53   total_loss: 84.42
=> STEP    3/123800   lr: 0.000001   giou_loss: 9.74   conf_loss: 52.53   prob_loss: 22.90   total_loss: 85.17
=> STEP    4/123800   lr: 0.000001   giou_loss: 5.73   conf_loss: 45.76   prob_loss: 9.78   total_loss: 61.27
=> STEP    5/123800   lr: 0.000001   giou_loss: 12.86   conf_loss: 63.69   prob_loss: 18.11   total_loss: 94.65
=> STEP    6/123800   lr: 0.000001   giou_loss: 4.04   conf_loss: 29.02   prob_loss: 12.42   total_loss: 45.48
=> STEP    7/123800   lr: 0.000001   giou_loss: 12.05   conf_loss: 76.52   prob_loss: 19.87   total_loss: 108.44
=> STEP    8/123800   lr: 0.000002   giou_loss: 2.67   conf_loss: 8.95   prob_loss: 1.38   total_loss: 13.00
=> STEP    9/123800   lr: 0.000002   giou_loss: 24.18   conf_loss: 154.82   prob_loss: 50.40   total_loss: 229.40
=> STEP   10/123800   lr: 0.000002   giou_loss: 7.75   conf_loss: 38.06   prob_loss: 5.73   total_loss: 51.54
=> STEP   11/123800   lr: 0.000002   giou_loss: 19.06   conf_loss: 92.25   prob_loss: 27.02   total_loss: 138.33
=> STEP   12/123800   lr: 0.000002   giou_loss: 13.39   conf_loss: 74.36   prob_loss: 52.83   total_loss: 140.58
=> STEP   13/123800   lr: 0.000003   giou_loss: 6.14   conf_loss: 43.05   prob_loss: 17.16   total_loss: 66.35
=> STEP   14/123800   lr: 0.000003   giou_loss: 10.43   conf_loss: 44.24   prob_loss: 16.38   total_loss: 71.04
=> STEP   15/123800   lr: 0.000003   giou_loss: 10.52   conf_loss: 62.30   prob_loss: 23.42   total_loss: 96.24
=> STEP   16/123800   lr: 0.000003   giou_loss: 20.54   conf_loss: 83.18   prob_loss: 38.29   total_loss: 142.01
=> STEP   17/123800   lr: 0.000003   giou_loss: 6.66   conf_loss: 40.61   prob_loss: 11.53   total_loss: 58.81
=> STEP   18/123800   lr: 0.000004   giou_loss: 20.00   conf_loss: 94.56   prob_loss: 39.48   total_loss: 154.03
=> STEP   19/123800   lr: 0.000004   giou_loss: 19.43   conf_loss: 84.69   prob_loss: 22.74   total_loss: 126.87
=> STEP   20/123800   lr: 0.000004   giou_loss: 21.21   conf_loss: 119.39   prob_loss: 49.07   total_loss: 189.66

I thought the total_loss is too large even though I used the pre-trained model. To investigate the loss values, I inserted the following sentences at line 89 in train.py (inside "train_step" function)

                print("{} : target[i][0]".format(i))
                print(np.max(target[i][0]))
                print("---------------------------------------------")
                print(np.max(target[i][1]))
                print("####################")

The results are as follows.

0 : target[i][0]
360.5
---------------------------------------------
360.5
####################
1 : target[i][0]
0.0
---------------------------------------------
0.0
####################
2 : target[i][0]
0.0
---------------------------------------------
0.0
####################
=> STEP    1/123800   lr: 0.001000   giou_loss: 17.50   conf_loss: 149.47   prob_loss: 36.95   total_loss: 203.93
0 : target[i][0]
391.5
---------------------------------------------
391.5
####################
1 : target[i][0]
0.0
---------------------------------------------
0.0
####################
2 : target[i][0]
0.0
---------------------------------------------
0.0
####################
=> STEP    2/123800   lr: 0.000000   giou_loss: 13.62   conf_loss: 71.56   prob_loss: 25.31   total_loss: 110.48
0 : target[i][0]
406.0
---------------------------------------------
406.0
####################
1 : target[i][0]
0.0
---------------------------------------------
0.0
####################
2 : target[i][0]
0.0
---------------------------------------------
0.0
####################
=> STEP    3/123800   lr: 0.000001   giou_loss: 18.30   conf_loss: 107.09   prob_loss: 43.55   total_loss: 168.93
0 : target[i][0]
387.5
---------------------------------------------
387.5
####################
1 : target[i][0]
0.0
---------------------------------------------
0.0
####################
2 : target[i][0]
0.0
---------------------------------------------
0.0
####################

I found the target[1][0] and target[2][0] always output zero. This seems to be a main reason why I got the large loss values. Further, I also found that the output-model has low mAP only with 3000 steps. (About 10.0 with Pycocotools)

I have no idea to fix this and I don't know whether this is a bug or due to my mistakes. If you have any ideas to resolve this issue, please tell me, thanks.

HirofumiTsuda avatar Oct 20 '20 10:10 HirofumiTsuda

i have the same question , Have you solved it?

tjpulfn avatar Feb 23 '21 03:02 tjpulfn

If anyone knows how the tensorflow model can be easily loaded into keras for pruning, would be very useful!

rossGardiner avatar Aug 11 '21 12:08 rossGardiner