arviz How should we use constant_data, specifically from PyMC3?

Short Description

The latest version of arviz has constant_data in the InferenceData object. I assume that this is intended to support plotting regressions by allowing us to add predictors to InferenceData.

As far as I can tell from reading the data, the only thing that will show up in this group is the set of variables in the model that are not contained in the set of default varnames. I don't see how this lets us load predictors. Unless possibly it is intended that we make all of the predictors be pm.Data objects.

Also, the user cannot set constant_data when building an InferenceData object from a prior predictive model, which means, I believe, that we won't be able to use arviz to plot when we do prior predictive samples with known predictors to examine our prior regression model.

I can think of some possible improvements:

We could enable the insertion of constant data for prior predictive and posterior predictive samples alone (i.e., without a posterior trace), by using pymc3's modelcontext(). This would enable someone to write code like:

with model:
   data = az.from_pymc3(None, prior=prior_trace)

This might, potentially, allow programmers to put coords and dims into the pymc3.Model object.

I don't believe the constant_data are necessarily constant across groups, are they? If one has a regression model, then probes it with prior predictive sampling or posterior predictive sampling, then the value of predictors might be different in the posterior, prior, and posterior_predictive, mightn't they? One could, of course, make different InferenceData objects for these cases, but then one runs into the problem that the only way to get constant_data populated is by providing a MultiTrace from which arviz can extract the Model object.

Code Example or link

Please provide a minimal, self-contained, and reproducible example demonstrating what you're trying to do. Ideally it will be a code snippet, link to a notebook, or link to code that can be run on another user's computer.

Also include the ArviZ version and version of any other relevant packages.

Relevant documentation or public examples

Please provide documentation, public examples, or any additional information which may be relevant to your question

Sep 18 '19 21:09 rpgoldman

As far as I can tell from reading the data, the only thing that will show up in this group is the set of variables in the model that are not contained in the set of default varnames. I don't see how this lets us load predictors. Unless possibly it is intended that we make all of the predictors be pm.Data objects.

Yes, the logic behind this is that if there are pm.Data variables which are not observed variables, they will probably be predictors, so they are automatically loaded into constant_data group. Therefore, iff you want the predictors to be loaded directly into the InferenceData object using only from_pymc3 they should all be pm.Data.

In order to store any set of predictors whether they are pm.Data or not they would have to be loaded into another InferenceData object and then concatenate it with the result of from_pymc3.

Maybe we could add a constant_data argument in from_pymc3 to directly load a dict of predictors into constant_data group, as I do not think data like sigma in the cookbook model is stored in any way by PyMC3. We can use this issue to discuss a better integration of constant_data with PyMC3.

I don't believe the constant_data are necessarily constant across groups, are they? If one has a regression model, then probes it with prior predictive sampling or posterior predictive sampling, then the value of predictors might be different in the posterior, prior, and posterior_predictive, mightn't they?

They may be different between prior run (prior and prior_predictive) and posterior run (posterior, observed data and posterior predictive) but then you would probably be working with different models given that the predictors are different, so it may be better to store them in different InferenceDatas.

Sep 19 '19 07:09 OriolAbril

I don't believe the constant_data are necessarily constant across groups, are they? If one has a regression model, then probes it with prior predictive sampling or posterior predictive sampling, then the value of predictors might be different in the posterior, prior, and posterior_predictive, mightn't they?

They may be different between prior run (prior and prior_predictive) and posterior run (posterior, observed data and posterior predictive) but then you would probably be working with different models given that the predictors are different, so it may be better to store them in different InferenceDatas.

Actually, I think that PyMC3's pm.Data objects are specifically designed to enable programmers to change the predictors and re-run the model.

If we are doing posterior predictive sampling with different predictors to, for example, generalize to conditions we have not seen, we cannot simply make a new, different model, or a new InferenceData (since the structure of the InferenceData object depends heavily on the structure of the posterior trace passed to from_pymc3()

Sep 19 '19 20:09 rpgoldman

@michaelosthege I think we can close this now that v4 will allow constant pm.Data so that they are added in InferenceData but don't represent a performance penalty due to being shared variables.

Nov 09 '21 00:11 OriolAbril

@michaelosthege I think we can close this now that v4 will allow constant pm.Data so that they are added in InferenceData but don't represent a performance penalty due to being shared variables.

Was there a PR about this already?? I opened https://github.com/pymc-devs/pymc/issues/5105 , but I don't recall that it's been worked on yet.

Nov 09 '21 08:11 michaelosthege

Was there a PR about this already??

I don't know, I just figured there would be an open issue/PR that would be better placed to track this than this one

Nov 09 '21 11:11 OriolAbril