KL-Loss why stop gradient?

why stop gradient?

Open yuanqianguang opened this issue 2 months ago • 3 comments

Hello. may I ask why the gradient of the sigma prediction branch is stopped to the bbox branch in the program? If it is necessary to stop the gradient of sigma prediction branches, an article in 2022 proposed to fill the gap between Gaussian distribution and Dirac delta distribution by using IoU as a power to solve the problem that Gaussian distribution and Dirac delta distribution cannot be completely equivalent. In this case, does the gradient transmitted by sigma loss also need to be stopped?

Apr 06 '24 00:04 yuanqianguang

tbh, I don't exactly remember the details. You can try remove stopgrad and compare the peformance

Apr 07 '24 06:04 ethanhe42

Thank you for your reply and I get the answer. There is another confused question. KL Loss is used as regression loss in the paper. Is it mean choosing that the smooth L1 Loss is replaced by KL Loss? or using smooth L1 Loss and KL Loss at the same time. I think the former is the answer, but the code is confused me because of the usage of ['loss_bbox'] and I think ['loss_bbox'] is smooth L1 Loss which means smooth L1 Loss and KL Loss exist at the same time for me. https://github.com/yihui-he/KL-Loss/blob/master/detectron/modeling/fast_rcnn_heads.py#L175 Wish for your reply~

Apr 20 '24 02:04 yuanqianguang

https://github.com/ethanhe42/Stronger-yolo-pytorch/blob/master/models/yololoss.py#L101 This is the code from the usage of KL Loss in yolo, and it seems that the answer of replacing operation is right? So they are not exist at the same time?

Apr 20 '24 03:04 yuanqianguang

KL-Loss KL-Loss copied to clipboard

why stop gradient?

KL-Loss
KL-Loss copied to clipboard