recovery_likelihood
recovery_likelihood copied to clipboard
Can you explain why the energy is divided by b0?
In the following code for computing the (unnormalized) log probability, the network output is divided by b0
.
https://github.com/ruiqigao/recovery_likelihood/blob/c77cc0511dedcb8d9ab928438d80acb62aeca96f/model.py#L154
I wonder if there is a legitimate explanation for this division.
b0
is supposed to be step_size_square
, which usually has a very small value.
https://github.com/ruiqigao/recovery_likelihood/blob/c77cc0511dedcb8d9ab928438d80acb62aeca96f/model.py#L184
I wonder if dividing by this b0
makes the gradient too large and harms the training in some settings.
I think that's the scaling trick explained in "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models" see Appendix A here.