yolov2-yolov3_PyTorch About training yolov3 gradient explode problem

About training yolov3 gradient explode problem

Open wenlongli10 opened this issue 3 years ago • 5 comments

When I trained YOLOV3 at Batch Size = 4, ‘nan’ appeared. After Debug, I found the infinite number appears at models/yolov3.py 118th code,
This is because the output of the network txtytwth_pred [:::,:, 2:] some values are too large, resulting in an infinite exp value.

Therefore, this is not the author's implementation problem, it is the reason for the unstable training, so I reduced the learning rate to 1e-4, and the training was normal. Hope can help to other friend.

Sep 01 '22 11:09 wenlongli10

@kill2013110 Batch size =4 is too small to stable training. Lowering the learning rate is helpful.

You could try my another YOLO project:

https://github.com/yjh0410/PyTorch_YOLO-Family

or:

https://github.com/yjh0410/FreeYOLO

Sep 01 '22 12:09 yjh0410

@kill2013110 Batch size =4 is too small to stable training. Lowering the learning rate is helpful.

You could try my another YOLO project:

https://github.com/yjh0410/PyTorch_YOLO-Family

or:

https://github.com/yjh0410/FreeYOLO

Thank you for your reply，i have solved. I have another question. Do you think the infer speed of this project is the best way to achieve?

Sep 01 '22 12:09 wenlongli10

@yjh0410 In other words, do you think there is room for improvement in inference speed based on pytorch？

Sep 01 '22 12:09 wenlongli10

@kill2013110 I don't think that. Although my way of testing inference speed isn't too serious, it's clearly not rigorous. I implemented a better infer speed measurement function in the benchmark.py file in my FreeYOLO project, referring to MMDetection.

Sep 01 '22 12:09 yjh0410

@yjh0410 ok，thanks

Sep 01 '22 12:09 wenlongli10

yolov2-yolov3_PyTorch yolov2-yolov3_PyTorch copied to clipboard

About training yolov3 gradient explode problem

yolov2-yolov3_PyTorch
yolov2-yolov3_PyTorch copied to clipboard