Evaluating the gradient within the log probability function?

Open nmudur opened this issue 1 year ago • 0 comments

I have a neural network-based log probability function $log p_{NN}(\theta|x, \vec{t})$. If I increase the size of $\vec{t}$ my code essentially creates a batch of x repeated len($\vec{t}$) times. While I am able to refactor my code so I compute this log probability and add it in smaller batches of len($\vec{t}_{BATCH}$), I still run into memory issues while computing the gradient.

I further refactored the log probability function code to accumulate the gradients while evaluating the log probability function so it returns a log probability as well as its gradient with respect to the parameters $\theta$. Now, the pass_grad argument seems to only accommodate a constant tensor or function that returns a tensor of dimension D. The NN-based log probability is also stochastic so I cant wrap the gradient as a separate function and pass it separately.

I would ideally like to restructure the code so as to evaluate the gradients when it evaluates the log probability -- I was going to modify my local hamiltorch package to do this, but I first thought I'd check if there's already a function in the package that handles this or a better workaround, in case other users have encountered this before?

Mar 10 '24 21:03 nmudur