pymc-experimental Implement some predictive model transformations

Companionship to #111

Exploring some possibilities and also understanding the limitations/ design issues.

This is just a draft to guide design decisions for now.

Apr 14 '23 11:04 ricardoV94

Add a transformation called impute that takes any random variable and replaces it with a TensorConstant of the appropriate size and dtype. This, along with observe will serve as the basis for do-calculus and would open the door of practically all of causal inference to us.

Any preference for a TensorConstant instead of SharedVariable? The latter can be modified without needing to reconstruct the graph

May 10 '23 14:05 ricardoV94

Following up here with the conversation I had with @ricardoV94 on discourse so it doesn't get lost --

Another nice transformation would be one that takes in an MvN distribution and marginalizes over observed data, returning a new conditional distribution over the set complement to the observations. It would look something like this:

from pytensor.tensor.random.basic import multivariate_normal, MvNormalRV
from pytensor.tensor.nlinalg import matrix_dot
import pytensor.tensor as pt
import pymc as pm

def marginalize_over_observations(dist, observations):
    if not isinstance(dist.owner.op, MvNormalRV):
        raise NotImplementedError

    # Is this always true?
    *_, mu, cov = dist.owner.inputs
    observations = pt.as_tensor_variable(observations)
    
    is_nan = pt.isnan(observations)
    missing_idx = pt.nonzero(is_nan, return_matrix=False)[0]
    obs_idx = pt.nonzero(pt.neq(is_nan, np.array(True)), return_matrix=False)[0]
    
    n_missing = missing_idx.shape[0]
    n_obs = obs_idx.shape[0]
    n = n_missing + n_obs
    
    # Z is a map from R_n to R_observed
    # K is a map from R_n to R_missing
    Z = pt.set_subtensor(pt.zeros((n_obs, n))[pt.arange(n_obs), obs_idx], 1)
    K = pt.set_subtensor(pt.zeros((n_missing, n))[pt.arange(n_missing), missing_idx], 1)
    
    resid = observations[obs_idx] - Z @ mu

    # TODO: Benchmark this vs indexing on a large problem, gains from BLAS?
    cov_uu = matrix_dot(K, cov, K.T)
    cov_uo = matrix_dot(K, cov, Z.T)
    cov_oo = matrix_dot(Z, cov, Z.T)
        
    # This could be sped up if we got chol instead of cov, because we could to solve_triangular twice to invert
    # Typical MvN parameterization is in terms of chol, so there's a lot of waste going back and forth
    # over and over?
    cov_oo_inv = pt.linalg.solve(cov_oo, pt.eye(n_obs), assume_a="pos", check_finite=False)
    beta = cov_uo @ cov_oo_inv
    
    conditional_mu = Z @ mu + beta @ resid
    conditional_cov = cov_uu - matrix_dot(beta, cov_oo, beta.T)
    
    return multivariate_normal(mean=conditional_mu, 
                               cov=conditional_cov, 
                               dtype=dist.dtype)

Simple usage:

observation = np.array([np.nan, 1, np.nan])
L = np.random.normal(size=(3, 3))
cov = L @ L.T
d = pm.MvNormal.dist(mu=np.zeros(3), cov=cov)
d2 = marginalize_over_observations(d, observation)
d2.eval().shape
>>> Out: (2,)

This could then be applied to a model with 1) A list of MvN Rvs to re-write, and 2) a list of observations to condition them on. I will have a look at how the other functions in this PR work and see if I can have a try, but I actually don't know how to PR into a PR :>

Jul 08 '23 10:07 jessegrabowski

@jessegrabowski you can push directly to my branch if you have the git permissions. Otherwise you can fork my branch and open a PR with that as a base. If that doesn't work just open a PR directly on main because I will have the permissions to work on it anyway.

About the Cholesky thing we added some rewrites recently in PyTensor that work as long as the right "tags" are on the variables (e.g. "lower_triangular".

You can check it in: https://github.com/pymc-devs/pymc/pull/6736/commits/ee1657be26bbe4403cbf4f4faa114b348bb542f5

https://github.com/pymc-devs/pytensor/pull/303

Would those cover the case here? If not, can we add the rewrite in PyTensor?

Jul 08 '23 11:07 ricardoV94

About the observations could we just request the new ones and combine with the old observed values from the model/trace

Jul 08 '23 11:07 ricardoV94

About the Cholesky thing we added some rewrites recently in PyTensor that work as long as the right "tags" are on the variables (e.g. "lower_triangular". You can check it in: https://github.com/pymc-devs/pymc/commit/ee1657be26bbe4403cbf4f4faa114b348bb542f5 https://github.com/pymc-devs/pytensor/pull/303 Would those cover the case here? If not, can we add the rewrite in PyTensor?

The rewrite handles my quip about going back and forth, since it looks like now the canonical MvN will be parameterized by the cholesky factorized covariance matrix? I was surprised to see that if you give pytensor.tensor.random.basic.multivariate_normal a cholesky factorized cov, it computes cov = L @ L.T, I assumed it would instead be the other way around (if you give cov it computes L = cholesky(cov)). Anyway my point in the comment was that if we always know we can get a cholesky factorized covariance matrix out of dist.owner.inputs, then we can do a more efficient algorithm for matrix inversion. Or check that tag and do a switch?

One last thing I just thought of: this "conditionalization" has implications for missing data imputation in MvN likelihoods -- the missing values should be filled with these conditional distributions. I know you just did a PR that brought back multivariate imputation, but I didn't carefully look to see what goes into the missing values now

Jul 08 '23 11:07 jessegrabowski

The default parametrization of the MvNormal is still the covariance. What we did was to add rewrites to remove useless operations when you tag that some variables where already lower triangular (which the outputs of LKJ are tagged with).

You should still write the code expecting a full covariance. The question is if the rewrites that we have right now would go from the full covariance graph to the optimized one as long as the tags are provided?

If not, we could open an issue on Pytensor to do it. Maybe @dehorsley would be interested in also implementing those?

Jul 08 '23 12:07 ricardoV94

About the imputation, yes in that case you shouldn't need anything else but it can be pretty inneficient to sample the variables with NUTS so it's nice if we can get them afterwards using the conditioning algebra.

CC @bwengals as this is pretty much the same thing with GPs

Jul 08 '23 12:07 ricardoV94

pymc-experimental pymc-experimental copied to clipboard

Implement some predictive model transformations

pymc-experimental
pymc-experimental copied to clipboard