yolov2-yolov3_PyTorch icon indicating copy to clipboard operation
yolov2-yolov3_PyTorch copied to clipboard

About training yolov3 gradient explode problem

Open wenlongli10 opened this issue 3 years ago • 5 comments

image When I trained YOLOV3 at Batch Size = 4, ‘nan’ appeared. After Debug, I found the infinite number appears at models/yolov3.py 118th code,
This is because the output of the network txtytwth_pred [:::,:, 2:] some values are too large, resulting in an infinite exp value.

Therefore, this is not the author's implementation problem, it is the reason for the unstable training, so I reduced the learning rate to 1e-4, and the training was normal. Hope can help to other friend.

wenlongli10 avatar Sep 01 '22 11:09 wenlongli10

@kill2013110 Batch size =4 is too small to stable training. Lowering the learning rate is helpful.

You could try my another YOLO project:

https://github.com/yjh0410/PyTorch_YOLO-Family

or:

https://github.com/yjh0410/FreeYOLO

yjh0410 avatar Sep 01 '22 12:09 yjh0410

@kill2013110 Batch size =4 is too small to stable training. Lowering the learning rate is helpful.

You could try my another YOLO project:

https://github.com/yjh0410/PyTorch_YOLO-Family

or:

https://github.com/yjh0410/FreeYOLO

Thank you for your reply,i have solved. I have another question. Do you think the infer speed of this project is the best way to achieve?

wenlongli10 avatar Sep 01 '22 12:09 wenlongli10

@yjh0410 In other words, do you think there is room for improvement in inference speed based on pytorch?

wenlongli10 avatar Sep 01 '22 12:09 wenlongli10

@kill2013110 I don't think that. Although my way of testing inference speed isn't too serious, it's clearly not rigorous. I implemented a better infer speed measurement function in the benchmark.py file in my FreeYOLO project, referring to MMDetection.

yjh0410 avatar Sep 01 '22 12:09 yjh0410

@yjh0410 ok,thanks

wenlongli10 avatar Sep 01 '22 12:09 wenlongli10