robust_loss_pytorch why use the log function to regularize the scale?

why use the log function to regularize the scale?

Open wzn0828 opened this issue 3 years ago • 4 comments

Hi, I have a question about the implementation. In the Distribution().nllfun method, to regularize the scale to decrease, why you use the log function？ I think the l2 or l1 function is common.

https://github.com/jonbarron/robust_loss_pytorch/blob/9831f1db8006105fe7a383312fba0e8bd975e7f6/robust_loss_pytorch/distribution.py#L208

Apr 29 '21 12:04 wzn0828

Log(scale) shouldn't be thought of as a regularizer, it's the log of the partition function of a probability distribution. Basically, this is not a "design decision", like L2 or L1 weight decay --- it ensures that the PDF implied by the loss function being viewed as a negative log-likelihood sums to 1, and it's the only thing you can minimize here that does that.

Apr 29 '21 17:04 jonbarron

Ok, I see, thank you very much. Another question, I see the adaptiveness can be realized through the negative log-likelihood in Equation (16). However, why is it reasonable? I note that you have a qualitative analysis in the first page and Figure 2, but, what's the fundamental theory behind the idea?

Apr 30 '21 06:04 wzn0828

This is a good idea if 1) you believe in maximum likelihood estimation (https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) and 2) if you want to maximize the likelihood of the observed data you're training on.

Apr 30 '21 17:04 jonbarron

WANDERFUL! Thank you very much.

May 02 '21 03:05 wzn0828

robust_loss_pytorch robust_loss_pytorch copied to clipboard

why use the log function to regularize the scale?

robust_loss_pytorch
robust_loss_pytorch copied to clipboard