recovery_likelihood Can you explain why the energy is divided by b0?

Can you explain why the energy is divided by b0?

Open swyoon opened this issue 2 years ago • 1 comments

In the following code for computing the (unnormalized) log probability, the network output is divided by b0.

https://github.com/ruiqigao/recovery_likelihood/blob/c77cc0511dedcb8d9ab928438d80acb62aeca96f/model.py#L154

I wonder if there is a legitimate explanation for this division.

b0 is supposed to be step_size_square, which usually has a very small value. https://github.com/ruiqigao/recovery_likelihood/blob/c77cc0511dedcb8d9ab928438d80acb62aeca96f/model.py#L184

I wonder if dividing by this b0 makes the gradient too large and harms the training in some settings.

Nov 23 '22 04:11 swyoon

I think that's the scaling trick explained in "On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models" see Appendix A here.

Aug 29 '23 12:08 h2o64

recovery_likelihood recovery_likelihood copied to clipboard

Can you explain why the energy is divided by b0?

recovery_likelihood
recovery_likelihood copied to clipboard