Venturecxx
Venturecxx copied to clipboard
Get gradients working with uncollapsed maker SPs
Currently, the derivative information for the output's influence on the latent variable, and both of those influences on the inputs, is simply lost.
Issues:
- Is it reasonable to take gradient steps on the made SP itself (if it has continuous latent state)?
- Gradient of what, exactly? The
AAALKernels
associated with stochastic makers take Gibbs steps, so the gradient of their weight is always 0.- Perhaps we need an explicit notion of
logDensityOfData
even for stochastic made SPs, in order to be able to take gradient steps with respect to it. (On a simplex, in some cases!) - On the other hand, what is the desired behavior if an uncollapsed AAA maker (that takes Gibbs steps) is a non-principal resampling node in a gradient-based proposal?
- Perhaps we need an explicit notion of
The commentary on #455 provides a hint to this problem. When the uncollapsed kernel understands the local posterior well enough, it should be able to report its size (as a measure). This is what logDensityOfData
computes.
We could then choose to move input (hyper-)parameters according to that feedback.