pymc icon indicating copy to clipboard operation
pymc copied to clipboard

Do not reference model variables in GP internals

Open ricardoV94 opened this issue 2 years ago • 8 comments

Description

The model fgraph functionality (used by do and observe) doesn't work with GP variables because those are not represented as pure PyTensor objects and make use of variables from the old model in the instance scope.

ricardoV94 avatar Aug 30 '23 07:08 ricardoV94

Can we refactor the code in the way it works correctly?

ferrine avatar Sep 12 '23 17:09 ferrine

It's not easy, we need to figure out how to represent GPs as PyTensor variables/Ops.

Some very rough (and likely bad) ideas in https://github.com/pymc-devs/pymc-experimental/blob/2084e167486a42b313abcec77958d3a24e001ae3/pymc_experimental/gp/GPs%20from%20scratch.ipynb

ricardoV94 avatar Sep 12 '23 17:09 ricardoV94

There's actually no GP object saved in the model, so no good way to raise an error.

The problem is that GP objects hold references to model variables, so if you tried to do a gp.conditional after cloning a model (which happens whenever you used do/observe) you would be re-introducing the old model variables, instead of referring to cloned ones.

import pymc as pm
import numpy as np
from pytensor.graph import ancestors
from pymc.model.fgraph import clone_model

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=1)
    cov = 2 * pm.gp.cov.ExpQuad(1, ell)
    gp = pm.gp.Latent(cov_func=cov)
    f = gp.prior("f", X=np.arange(10)[:, None])
    pm.Normal("y", f * 2)

assert ell in ancestors(f)

cloned_model = clone_model(model)
with cloned_model:
    f_cloned = gp.prior("f_cloned", X=np.arange(10)[:, None])
    pm.Normal("y_cloned", f_cloned * 2)

assert model["ell"] in ancestors(f_cloned)  # bad
assert cloned_model["ell"] not in ancestors(f_cloned)  # bad
cloned_model.logp()  # ValueError: Random variables detected in the logp graph: {ell}.

I see two ways of fixing this:

  1. Use an approach more like the statespace module in pymc-experimental, which retrieves all variables by name from the model context when those are needed (is that correct @jessegrabowski?)
  2. Try to define GPs as PyTensor objects/operations (this is not trivial, but could be interesting in the long term). See link above.

ricardoV94 avatar Oct 05 '23 14:10 ricardoV94

Yes, the state space models have a list of required parameters, and it does name matching to model RVs when you "build" the model.

The challenge with that approach is that the statespace model needs knows ahead of time what variables to look for. It ends up being really inflexible, because users have to conform to the demands of the model, rather than the (preferable) other way around.

The core of the GP is just the conditioning operation. I think what could be interesting is to define something like pm.condition(X, y), which would ask the RV X to return a conditional version of itself, given data y. I guess this isn't defined in most cases, so it could just raise on those. But when called on a MvNormal(mean_func, cov_func), it would give back the GP as a symbolic random variable that could be worked with "as usual"?

jessegrabowski avatar Oct 05 '23 14:10 jessegrabowski

Breaking the do-operator puts a bit more urgency on this idea now

bwengals avatar Oct 19 '23 07:10 bwengals

I'm wondering if rewriting the GP lib as pytensor ops is the only way forward here. It's probably easier to make do and observe work with existing GPs than the other way around (just going off lines of code). Though there are other interesting things that I think you'd gain from a GPs in PyTensor.

which retrieves all variables by name from the model context

It wouldn't be hard to store GP objects in the model. It's not as slick, but could they be registered and then reproduced the same way random variables and deterministics are?

I think designing a hierarchical normal model Op would be a simpler, univariate, version of what a GP Op would need to do:

  • gp.conditional -> predicting on new groups
  • gp.prior -> Choosing centered or non-centered

I dunno what an Op-ified version of a GP would look like, but a hierarchical normal model that follows the current GP API would look like this:

mu = pm.Normal("mu", ...)
sigma = pm.HalfNormal("sigma", ...)
hier = pm.Hierarchical("x", mu=mu, sigma=sigma) # like pm.gp.cov.Latent
x = hier.prior("x", dims="group")   # (optionally) non-center under the hood, `gp.prior` does that

and then for prediction

xnew = hier.new_group("xnew") # like gp.conditional

Not proposing we implement that like that, just to show the similarity for why I'm thinking a PyTensor version of a hierarchical normal model might be a good first step towards a PyTensor version of a GP.

bwengals avatar Dec 10 '23 05:12 bwengals

It wouldn't be hard to store GP objects in the model. It's not as slick, but could they be registered and then reproduced the same way random variables and deterministics are?

They could but that adds another layer of complexity for all Model methods and fgraph manipulation functions that now have to think about yet another kind of Model variable

The is is the existing logic which is already pretty complex: https://github.com/pymc-devs/pymc/blob/01ddcb8f07334f816772781684c70b64fc6fe017/pymc/model/fgraph.py#L116-L277

We also used to have special rules for Minibatch variables and it was quite cleaner to just represent those as vanilla PyTensor objects: https://github.com/pymc-devs/pymc/pull/6523

As a first approach I would just to it with name approach like statespaces models does, so store the name of x/mu/sigma instead of the variable itself and retrieve it from the model context via model[name] (to ensure they are always there they need to be model variables, data or deterministics).

Representing stuff as PyTensor objects shouldn't be that hard, but it definitely sounds like it will take a while longer to find the right representation and it would be nice to fix the incompatible behavior we have right now.

ricardoV94 avatar Dec 10 '23 08:12 ricardoV94