KL-Loss
KL-Loss copied to clipboard
why stop gradient?
Hello. may I ask why the gradient of the sigma prediction branch is stopped to the bbox branch in the program? If it is necessary to stop the gradient of sigma prediction branches, an article in 2022 proposed to fill the gap between Gaussian distribution and Dirac delta distribution by using IoU as a power to solve the problem that Gaussian distribution and Dirac delta distribution cannot be completely equivalent. In this case, does the gradient transmitted by sigma loss also need to be stopped?
tbh, I don't exactly remember the details. You can try remove stopgrad and compare the peformance
Thank you for your reply and I get the answer. There is another confused question. KL Loss is used as regression loss in the paper. Is it mean choosing that the smooth L1 Loss is replaced by KL Loss? or using smooth L1 Loss and KL Loss at the same time. I think the former is the answer, but the code is confused me because of the usage of ['loss_bbox'] and I think ['loss_bbox'] is smooth L1 Loss which means smooth L1 Loss and KL Loss exist at the same time for me. https://github.com/yihui-he/KL-Loss/blob/master/detectron/modeling/fast_rcnn_heads.py#L175 Wish for your reply~
https://github.com/ethanhe42/Stronger-yolo-pytorch/blob/master/models/yololoss.py#L101 This is the code from the usage of KL Loss in yolo, and it seems that the answer of replacing operation is right? So they are not exist at the same time?