nimble Posterior predictive revamp

Not safe to merge.

Do not merge.

Only to see how bad testing fails.

Aug 15 '22 02:08 danielturek

@danielturek I see you're cranking on this, which I think entails having predictive node sampling not affect sampling for the parameters.

You probably have thought about these, but note that NCT issues 285 and 269 have various ideas for consideration that may relate to what you're doing.

Aug 17 '22 15:08 paciorek

@paciorek Thanks for pointing that out. I'll address some of those in a message here later.

Aug 17 '22 15:08 danielturek

@perrydv @paciorek Let me describe (briefly) what this undertaking is, since I note it is not fully in-line with some of the thinking in NCT issues 285 and 269. Those discussions talk about a variety of different systems for handling PP (posterior predictive) nodes, including post-hoc sampling of PP nodes, and entirely different systems for managing samplers of PP vs. non-PP nodes.

In contrast, what I'm working on here stays within the current MCMC structure. Very simply: add functionality to getNodeNames and getDependencies to identify and return PP nodes. Then, in MCMC samplers, (controlled by a system option) exclude PP nodes from the calcNodes mantra - excluding PP nodes from the calculations of acceptance probabilities, etc. (PP nodes are handled as currently, by the existing posterior_pred and posterior_pred_branch samplers).

This is something I can get working. Note the "exact results" would not match the former results, when PP nodes exist in a model.

Is this undertaking OK? What do you guys think?

Aug 17 '22 19:08 danielturek

Stoked that you are working on this - it's been a big drawback in efficiency for us (computational and mcmc) for a long time now.

I think that makes sense and we can build additional stuff later (e.g., only sampling PP nodes if they are monitored and at their thinning interval).

However, I think we need all PP samplers to run at the very end of the set of samplers. O.w., a PP sample at a given iteration could be inconsistent with a parameter that is updated later on in the cycle over the samplers. And of course a danger is that a user could assign samplers manually that breaks this (but we could add checks to 'addSampler'). You may well have already thought that through.

Aug 17 '22 19:08 paciorek

[Daniel is in disbelief that testing has passed]

Aug 18 '22 10:08 danielturek

@paciorek I think this warrants some discussion, about the ordering MCMC samplers (w.r.t. PP nodes). No, at present the MCMC does nothing to put the sampling of PP nodes "last".

I'm uncertain about whether this is necessary, or important. For example, forgetting about PP nodes, if the model dependency structure is: theta -> x -> y, where the names imply parameter, latent state, data, then:

Our default sampler order may well be 1:theta, 2:x, 3:record samples
The samples recorded in the first MCMC iteration would be {theta1,x1}, where theta1 ~ [theta | x=x0], and x1 ~ [x | theta=theta1, y=y].
Point being: yes, drawing the sample x1 is a conditioned on the value theta=theta1. But drawing the sample theta1 (as recorded in the first MCMC iteration) is not conditioned on x1. It's conditioned on the previous value of x=x0.
This becomes even more muddled in more complex models, e.g., theta -> w -> x -> y.
Regardless of the sampling order (if the sampling order is 1:theta, 2:x, 3:record samples, or if it's 1:x, 2:theta, 3:record samples), the posterior samples for {theta, x} will still be a valid sample from the posterior distribution [theta,x | y] - right?. So the sampling order doesn't matter. And, accordingly, the MCMC doesn't do anything (or need to do anything) to check or enforce this. Users can reorder, add, remove, or interleave samplers as desired. No problem, as long as all un-observed dimensions get updated at some point.

So.. bringing this discussion back to PP nodes, inside I feel all the same apples. Say our model structure is theta -> x -> PP. The default sampling order may well be 1:theta, 2:x, 3:PP, 4:record samples. And in that case, yes, the samples recorded for PP will be generated conditional upon the values of x in that same MCMC iteration. That is, we'll record the MCMC sample {theta1, x1, PP1}, where PP1 ~ [PP | x=x1]. So that's "nice" in some sense (that it's consistent with the notion of post-processing).

But, my point, if the sampler ordering was instead 1:theta, 2:PP, 3:x, 4:record samples, then we're still going to generate a valid draw from the target posterior [theta,x,PP | y], and I'm not sure we make any promises anywhere that the recorded samples of PP are generated using the values of other model nodes recorded in that same MCMC iteration. Yes, that's how samples of predictive nodes would be generated using any post-processing routine, but that's not what we're doing here.

Anyway, maybe I'm missing something, but please point out if (any of):

My understanding of MCMC theory is incorrect
We have any implicit guarantee or reason to generate PP samples in the same way as post-processing would

Short of that, it feels a little like micro-managing to me. (Noting that the default sampler ordering would always do what you suggest, anyway).

Aug 18 '22 12:08 danielturek

Here's my take -- in a standard MCMC when one generates PP sample, it is conditional on all the other parameters. Then if one samples from a parameter later in the same iteration (before saving to mvSamples), it is conditional on the PP sample generated. so everything is ok. It's the usual Gibbs cycle.

If one omits the PP node from the conditioning when sampling a parameter, then if one were to sample the PP node before the parameter, the PP node will not be consistent with the current parameter value because that parameter was updated unconditional on the PP node. The marginal for the PP node should be fine but the joint for {parameter, PP node} will not, I don't think.

David van Dyk did work a while back on partially marginalized samplers (top 3 hits in Google scholar for "van dyk partially collapsed gibbs") should find some papers) that we could look at for more insight.

Aug 18 '22 14:08 paciorek

@paciorek Very good discussion. I agree, your point about sampling parameters unconditional on the PP nodes does raise a different situation, outside of the standard Gibbs sampling I was thinking about. So I agree with you uncertainty of the resulting joint distribution.

How about: at the time of buildMCMC, reorder all samplers to put posterior_predictive and posterior_predictive_branch samplers at the end (in the same ordering in which they appeared in the configuration), and also issuing a [Note] ... message for the user ? What do you think?

Aug 18 '22 16:08 danielturek

Makes sense to me. I had been thinking of doing something at the time of addSampler, but your idea makes more sense.

Aug 18 '22 16:08 paciorek

@paciorek @perrydv I'm reasonably happy with this PR.

A quick description of the changes here:

The following new options are added to both getNodeNames and getDependencies: includePosteriorPred, posteriorPredOnly, includePosteriorPredBranch, and posteriorPredBranchOnly. These options dictate the returning of posterior predictive nodes, and posterior predictive branch nodes.
There's a new system option: MCMCincludePredictiveDependencies. When TRUE, posterior predictive dependencies are included in MCMC calculation nodes - the same as old behavior. When FALSE (which is the default in this PR), posterior predictive dependencies are excluded, and not used in sampler calculations.
The determination of MCMC samplers' sets of "calculation nodes" is now abstracted into one function: mcmc_determineCalcAndCopyNodes, which determines all the necessary sets of MCMC sampler calculation and copy nodes. So that's where the new system option (MCMCincludePredictiveDependencies) is used, in one place, affecting all MCMC samplers which have been modified to use this system.
Addition of another system-level option MCMCreorderSamplersPosteriorPredLast, with default value TRUE, described below.
I also made a significant change to configureMCMC. Previously, depending on the graphical model structure, and which nodes are data, and the ordering of declarations in the model code, it's easy to have situations where posterior predictive samplers are not last. I can describe these situations more if that's helpful, but point being, when assigning samplers to all non-data-stochastic nodes, sorted "topographically", posterior predictive nodes can easily precede non posterior predictive nodes (only possible when they appear in different branches of the model graph). This PR includes a change which, (if MCMCreorderSamplersPosteriorPredLast is TRUE) at the time of default MCMC configuration, reorders all posterior predictive nodes to be at the end. So, in any default MCMC, all posterior predictive samplers will appear at the end. In some cases, this changes the default sampler ordering.
Also, (if MCMCreorderSamplersPosteriorPredLast is TRUE) a new check is carried out at the time of buildMCMC. If a user has added additional samplers, or reordered the samplers, such that all posterior predictive samplers are no longer at the end, then the samplers are reordered at MCMC-build-time to achieve this. It also issues an informative message if reordering takes place. So, all posterior predictive samplers operate last, before MCMC samples are collected.

Since this is a major change touching moving parts, I tried to do extensive testing:

Tests that the new arguments to getNodeNames and getDependencies all work correctly, in a variety of situations regarding data nodes and posterior predictive nodes.
Test for asymptotic correctness of posteriors, which exercises all the sampler functions that were modified, executing them with and without the new option set, in the presence of posterior predictive dependencies. All posteriors agree well.
Test of the reordering mechanism for MCMC samplers, both that done in the default configuration in configureMCMC, and also the check and reordering done at MCMC-build-time in buildMCMC.
Furthermore, some of the original testing in the MCMC gold file failed - it did not give the same "exact results" in the presence of posterior predictive nodes, which we would expect to (not) happen - but I did verify that in those cases, the means and deviations of the differing posterior samples were (once again) the same, giving more verification of a correct implementation.
I also tried to be extremely careful when thinking through all the logic here (FWIW).

Like I said, I'm pretty happy with this.

I welcome additional sets of eyes. It's a lot of changes.

Aug 21 '22 01:08 danielturek

Hi @danielturek @paciorek . First thanks for all of this. Here are quick thoughts to be followed by code review later.

Can we bounce around names? For the options in getNodeNames and getDependencies, I am wondering if we can make the new argument names more purely about graph structure without any implied use case. E.g. people might not be doing a Bayesian method. Would includePred and predOnly be sufficient? Is there a better way to name them then "predictive" nodes?
Do we really definitely need also include[Posterior]PredBranch and p[osteriorP]redBranchOnly ? Can you remind me what our definition of "branch" is? I think it is a sequence of nodes that may depend on each other but are jointly predictive nodes in that they have no descendants that are data nodes. But does includePred or includePredBranch return just the point of departure, or the whole set... the labeling does not feel intuitive. Could includeFirstPred replace one of them? Or is there any way to avoid having two new pairs of options (i.e. four new options instead of two)? I might be lost and it has been a while since I've thought about this ball of wax.
I worry about the potential for confusingly incorrect MCMC behavior if users are somehow customizing their configuration. Specifically, if theta->x->pred, and the user keeps the default sampler for x and also adds another sampler for x, the MCMC will only be correct if both are created with the same setting for MCMCincludePredictiveDependencies. I guess that must be the case because that option will be checked during buildMCMC (rather than configureMCMC), right? Still, how could this go wrong and how could we error-trap for it? Are there steps where we can store the value of MCMCincludePredictiveDependencies and then check that it hasn't been changed when it shouldn't be? I'm not coming up with a specific scenario here but it seems worth worrying about.
On the issue of when posterior predictive samplers (including branch samplers) should run: I do think our separate discussion (NCT 285) is relevant, so let me put the idea here as well. It would be to have a second list of nimbleFunctions in an MCMC that is called only immediately prior to saving samples, i.e. every thin iterations. The list would be similar to the list of sampler functions in that it would be a nimbleFunctionList with a base class and standardized system of how setup will be called. Other needs that can use this system are recording sums of log probs for arbitrary node sets and online WAIC (which could be subsumed in this system). An advantage would be not simulating predictive nodes on iterations when their values will not be saved. It would seem "easy" to set up in that we have the samplerFunctions list and this would be similar. However one path could be to not do this now but then do it in a subsequent distinct PR later. It would obviate the need to place predictive samplers at the end of the list (and any potential incorrect results if that is not 100% foolproof). What do you think?

Aug 24 '22 00:08 perrydv

@danielturek @paciorek Some code notes:

In the roxygen documentation strings for getNodeNames: The text implies posterior predictive nodes must be stochastic, but I think people use predictive nodes for all kinds of posterior derived quantities, including deterministic values. Is that right?
Reading these doc strings tells me that branch nodes are the single nodes that start posterior predictive stuff. But it sounds like the node is a branch, i.e. the full branch. Not sure we can do better but just noting.
I'm a little resistant to safeUpdateValidValues as the previous code there was old and stable and an efficiency concern. It seems like the new lines could take similar form as the old ones? Is this fixing something that wasn't broken or did an issue come up?
What will be the behavior for setPostPredNodeIDs if the predictive node is a deterministic derived quantity node (e.g. ecologists often have something like sum(z[1:m]) to add up the virtual population in the model so they can save one summary node instead of a long vector of ones and zeros).
In setPostPredNodeIDs: i. repeated calls to isEndNode(stochNonDataIDs) could be avoided. That does repeated node name expansion which can get costly for large models. ii. Could expandNodeNames(getParents(dataNodeIDs, stochOnly = TRUE), returnType = 'ids') be replaced with getParents(dataNodeIDs, stochOnly = TRUE, returnType = 'ids') (possibly returnScalarComponents=TRUE)? I haven't checked but it looks similar. iii. Often when we've needed to iterate by node (instead of declaration) we encounter use cases where it is a bottleneck and then work on efficiency, but I guess we should wait and see.
In getDependencies: i. Is it the case that posterior branch nodes are always a subset of posterior nodes? If so would a more natural default for the new options be includePosteriorPredBranch = FALSE since the other one is includePosteriorPred = TRUE? It seems odd to make defaults that enforce redundant behavior. Stopping here.

Aug 24 '22 01:08 perrydv

Oops I wasn't done with that and my fingers posted it by accident. Let me edit it!

Aug 24 '22 01:08 perrydv

More comments:

MCMC_build 95-96: Would it work to save the result of grepl('^posterior_predictive', samplerNames) so we don't need to repeat it?
I might not be following completely but wondering if all(sapply(ppSamplerInd, function(ind) all(ind > otherSamplerInd))) could be replaced with min(ppSamplerInd) > max(otherSamplerInd)
Nice error-trap of the case that samplerExecutionOrder was modified. OTOH it seems it will be harder to use this feature (modification of samplerExecutionOrder, or use of posterior predictive redesign) easily.
The lines like MCMC_samplers 82 might in the longer run be more readable if not compacted like that. What do you think?
Just checking that we shouldn't have uses of decideAndJump where the step of mcmc_determineCalcAndCopyNodes(model, target) repeats the same step in the sampler's setup code. It looks like we don't have any repeats like that but just confirming.

Aug 24 '22 15:08 perrydv

Here is a model case to discuss:

A --> B --> C and also A --> C. C is posterior predictive.

Samplers are A', B', C' with targets A, B, C, respectively.

Sampler A' would involve B and C as dependencies, but C is omitted because it is posterior predictive.

User assigns their own sampler B' on target B, and their setup code does not respect the omission of posterior predictive nodes. That is, it includes C as a dependency.

Sampler order is A', B', C'. After sampler A' has operated, its value is no longer conditioned on the current value in C. When sampler B' operates, it will condition on the just-updated value of A and the current value in C. Will that not give incorrect MCMC results? A and C will be out of whack with each other when sampler B' uses them jointly.

Aug 24 '22 15:08 perrydv

Two comments on efficiency and error-trapping:

It looks like during model initialization, called from modelDefClass$newModel, data will always start empty, so the call to setPostPredNodeIDs from init_isDataEnv from initialize will have the known result that all nodes will look like posterior predictive nodes because no nodes are flagged as data. Is that right? It could be costly to iterate over the entire model in setProstPredNodeIDs to determine that.
If there is a workflow where with repeated calls to model$setData, after each one we will be iterating over the entire model to update posterior pred node IDs and branch IDs. I'm worried that could become a performance problem. We are not in the habit of workflows like:

m$setData(y)
m$setData(z)
m$setData(w)
# etc

but a user could do that and it could be inefficient. Is there any way we could have model$setData update posterior pred nodeIDs based specifically on which nodes are being newly marked as data and not go through the entire model each time? 3. In case these new pieces do create performance bottlenecks for some unforeseen models, could we add an option to completely skip over these steps?

Aug 24 '22 16:08 perrydv

@perrydv Thanks for such a thorough review, and the great suggestions. Trying to organize ideas here, I'm going to introduce letters to address different points.

(a) I'm fine with any other choices of the (new) argument names, which relate more directly to the graphical structure. Maybe something having to do with "trailing non-data"? But, maybe we should figure out (b) first.

(b) The definitions I used:

Posterior predictive (PP) node: A stochastic node in the model, which itself is not data, and has no downstream stochastic dependent nodes which are data.
Posterior predictive branch (PPB) node: A PP node which (i) has at least one downstream dependent stochastic (and therefore also PP) node, and (ii) is not itself a dependency of any other PP node. The PPB nodes are intended to be the unique set of (stochastic) "branching points" downstream from which are exclusively networks of non-data stochastic (and perhaps deterministic) nodes. I'd have to think carefully about this to say with certainty, but perhaps the every PPB node is either a top-node, or has a direct parent which is a stochastic data node.

So: PPB nodes are a subset of the PP nodes. And, by my initial definitions, PP and PPB nodes never include any deterministic nodes.

Do we want to revisit that choice, and have PP nodes contain deterministic nodes as well? I now recognize the choices I made were done from an MCMC-implementation standpoint, where we're concerned with stochastic nodes. I still feel like perhaps PPB nodes should only be the stochastic branch points. But maybe also including deterministic nodes as PP nodes? e.g.: param -> latent -> data -> PP(determ) -> PPB(stoch) -> PP(determ) -> PP(stoch) -> PP(determ)

What do you guys think? Or, should be PPB node be the deterministic node? I'm not sure any longer. But from an MCMC sampling standpoint, it is useful to have the PPB nodes be the first stochastic nodes encountered in the network of PP nodes.

Making a change here would affect the arguments discussed in (a).

(c) About including all 4 of these new options. The only one I can't imagine a use case for is using includePosteriorPredBranch = FALSE, where I'm not sure when you'd want to return a set of nodes, including PP nodes, but excluding PPB nodes. So maybe the 4 options could be simplified down to 3 (perhaps with different names):

return PP only (default FALSE)
return PPB only (default FALSE)
include PP nodes (which also encompasses whether to return PPB nodes, since they are PP nodes) (default TRUE).

I would be ok with just these 3 options, but welcome other ideas.

(d) Duly noted, the discussion about a 2nd set of "samplers", which are only executed right before samples are recorded. I put "samplers" in quotes, because this could include other things like sum-logProbs-of-nodes calculations, WAIC, etc (NCT 285). I'm not against this idea, but my preference is to handle this PP revamp first, and other such "new systems" in a later addition.

(e) I'm not against back-tracking on the addition of safeUpdateValidValues, but I will point out about the old code: anytime the "bool-vector" approach was used, as for example:

 if(dataOnly)			validValues[!boolIsData] <- FALSE

This is safe; even when boolIsData is all TRUEs, or all FALSEs, it's safe.

Any time the old code used the approach of indexing using a negated vector of map IDs, there's the potential for failure.. This was handled correctly, with custom code, for the latentOnly case. But, there could have been potential for failure with the topOnly and endOnly options, e.g. the old code:

if(topOnly)			validValues[-modelDef$maps$top_IDs] <- FALSE

which will fail if the referenced map IDs vector has length=0. Can this ever happen? Can we have a model with no top nodes, or no end nodes? I'm not sure. But if we did, if either of those maps (vector of IDs) was length=0, this would fail.

Also noting, these changes take place in getNodeNames, which is never contained in DSL compiled "run" code, and usually only used during a user doing interactive querying of a model, or in setup code, so a small performance hit, in exchange for the standardized code...? It's a fair question.

(f) At present, setPostPredNodeIDs only sets stochastic node IDs. By the current definitions, PP and PPB nodes are always stochastic. But we could change this, if that's the consensus, and I certainly see your point. This point (f) can be "closed", and defer further discussion to (b).

(g) Fair points about efficiency questions in setPostPredNodeIDs. I will address those (don't let me forget) when this function gets re-worked, as I'm guessing.

(h) I personally like saving lines, as in MCMC_samplers line 82. It helps me look at the code. I know it's just stylistic, and sometimes I can be touchy about code style, but I'll defer to you guys on this. My preference is to keep as is.

(i) @paciorek Can we get your thoughts on this one: the discussions relating to: A --> B --> C and also A --> C ? Clarifying that nothing in this (A, B, or C) is data. And default samplers would all be PP or PPB samplers, but say a user adds their own sampler B' on B, which does not respect the "omit PP dependencies" option and therefore the sampler B' includes C as a dependency. Thus, copying this directly from @perrydv 's comment:

Sampler order is A', B', C'. After sampler A' has operated, its value is no longer conditioned on the current value in C. When sampler B' operates, it will condition on the just-updated value of A and the current value in C. Will that not give incorrect MCMC results? A and C will be out of whack with each other when sampler B' uses them jointly.

(j) Regarding comment about model initialization from modelDefClass$newModel, and unnecessary calculations taking place in the setProstPredNodeIDs function. I could modify init_isDataEnv function, so rather than calling setPostPredNodeIDs, it simply makes everything PP, since that will always be the case at that point. Does that work?

(k) I don't mind adding an option to "skip determination of PP nodes" in the model, in case it becomes a bottle neck. Shall we (I) ?

Aug 25 '22 17:08 danielturek

I haven't been able to focus on the details of the whole discussion here, but in terms of the A->B->C, A->C example, that does seem like something that could cause incorrect results. We could take the example to also include B -> Y where Y is data for a situation where only C would have a predictive sampler assigned.

Aug 25 '22 21:08 paciorek

@danielturek My initial scenario was not entirely clear, and the intention was what @paciorek has indicated: Only C is posterior predictive.

If we agree this scenario is a problem, an alternative idea would be to have a system flag (or a model flag) that sets the posterior-predictive-returning behavior of getDependencies for all calls. Then for the entire processing of buildMCMC (all samplers' setup call), the model would return dependencies as if the posterior predictive nodes should never be considered. It would not be up to each and every sampler to use the same arguments reliably. Would it work?

Aug 27 '22 17:08 perrydv

@perrydv @paciorek I've thought about this a while. I cannot convince myself that it's not a potential problem, although I tried a number of test cases (manually tweaking whether samplers include the PP dependencies in calculations, then fitting models under different circumstances), and have not been able to identify a situation where it conclusively leads to incorrect posterior samples. But, that's definitely not sufficient reason to conclude "this isn't an issue".

Given that: (a) If users exclusively used the package-provided-samplers, then all samplers (which all use mcmc_determineCalcAndCopyNodes to determine the calculation nodes) will respect the new system option MCMCincludePredictiveDependencies, and everything works fine. (b) Assuming we push this option "down" a level, so it's checked and followed by getDependencies (rather than by the mcmc_determineCalcAndCopyNodes function), then if users write and assign their own samplers using getDependencies to determine calculation nodes, then all samplers (both package-provided, and custom written) will respect the option, and work fine. (c) Even if we make that change described in (b), if users write their own custom samplers which do not use getDependencies, to determine calculation nodes but instead they hard-code the node names, then there is potential for failure. But (open for discussion), I'm not too worried about this case, because by writing custom samplers and hard-coding node names, anyone can easily get incorrect results, in fact much more easily than not.

So. Would we all be ok with moving this system option down into getDependencies ? In this case, the option would change from MCMCincludePredictiveDependencies to something like dependenciesIncludePredictiveNodes, or something.

Thoughts?

Sep 02 '22 19:09 danielturek

I think I am happy with this plan, admittedly not having thought in depth about it. Not sure if it would make sense for us to have a call to discuss the revamp in general?

Sep 03 '22 17:09 paciorek

One more thought / question about this: I took some care in the (current) implementation to ensure that when PP nodes are excluded from the sampler calculations, the logProbs of the PP nodes are still kept current (that is, logProbs of PP nodes are still re-calculated using the new parent node values, even when the values of the PP nodes are excluded from the accept-reject decisions of the parent stochastic nodes). This is done, in part, using the calcNodesPPskipped variable, which is defined in the function mcmc_determineCalcAndCopyNodes appearing in MCMC_utils.R.

Why am I mentioning this?

Well, even if we make the behavior of getDependencies follow a system option, to determine whether it returns PP nodes - and we consider this as "safe", because user-written samplers using getDependencies will then have the same behavior - well, I don't think user defined samplers would generally also make sure to update the logProbs of PP nodes - which would leave them out-of-sync, and although not disrupt other MCMC calculations, at least would be inconsistent with the PP and parent node values.

Also, it's an argument for (even if we make this change, being discussed) keeping the new arguments includePosteriorPred and posteriorPredOnly to getDependencies, because these are useful (necessary?) for determining the calcNodesPPskipped variable.

Anyway, I hope this is clear.

Are you guys ok if we still leave the various options for getDependencies? Basically, the default behavior would be dictated by a new system option dependenciesIncludePredictiveNodes, as we discussed. But, this behavior can be over-ridden using the arguments, specifically includePosteriorPred and posteriorPredOnly. This is important,

Sep 04 '22 00:09 danielturek

Minor followup on our conversation about this work in terms of WAIC. Conditional WAIC simply determines the data nodes and calculates their density given current values of other params in model, so this revamp should not affect it. Marginal WAIC simulates over the user-specifies nodes to marginalize over and then calculates density of data nodes, so also should be unaffected.

Oct 03 '22 15:10 paciorek

@paciorek @perrydv What do you guys think about the following idea? I think it's a nice simplification.

Eliminate the notion of "posterior predictive branch points", and also eliminate the posterior_predictive_branch_sampler.

Instead: the posterior_predictive sampler updates all nodes downstream of it (what the current PPB sampler does). And at the time of sampler assignment, a posterior_predictive sampler is assigned only to the first PP node encountered. (A posterior_predictive sampler would still be assigned, as always, to stand-alone trailing PP nodes, and updating everything downstream, in that case, is only updating that single node).

Basically, just simplifying things a lot. With the understanding that the posterior_predictive sampler takes care of everything downstream of its target node.

Thoughts?

(Thinking about why this has moved in the direction is has - is only because we originally had only a PP sampler because I never wrote code to determine the "branch" points. Then I wrote that code, and added a new PPB sampler. But arguably I should have originally written the PP sampler to act like the PPB sampler, which is what I'm suggesting here).

Oct 06 '22 16:10 danielturek

@perrydv @paciorek I'm moving in that direction (described in my most recent comment), until I hear some reason against.

Oct 07 '22 16:10 danielturek

@danielturek This idea makes sense to me. I like the simplification. We could perhaps add a control parameter to turn off downstream updating and only update the individual target node. That would offer an opt-out. For example, maybe the downstream parts are computationally costly and a user decides they shouldn't be updated after all but doesn't want to change the model code.

The only other downside I see is that the downstream nodes will not appear as the target nodes of any sampler, which could be confusing for any inspection of what is being sampled. I guess a solution would be to have the setup code update the target to include all sampled nodes, not just the initial single target node provided as the target setup argument. Does that make sense?

Oct 08 '22 18:10 perrydv

@perrydv Thanks for the feedback and comments. For right now, I'm not going to try to update the target nodes variable, in part because the sampler setup code is executed at the time of buildMCMC, which would not help with printing of samplers usually taking place during MCMC configuration.

Oct 09 '22 20:10 danielturek

Fine by me. One question: will a user be confused that there is no sampler associated with downstream nodes. Perhaps if you print out info about the sampler it will summarize that it is sampling the downstream nodes too?

On Fri, Oct 7, 2022 at 10:05 AM Daniel Turek @.***> wrote:

@perrydv https://github.com/perrydv @paciorek https://github.com/paciorek I'm moving in that direction (described in my most recent comment), until I hear some reason against.

— Reply to this email directly, view it on GitHub https://github.com/nimble-dev/nimble/pull/1230#issuecomment-1271780454, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATCXXSBVRHVJE467O7EYITWCBC4JANCNFSM56Q2S6NQ . You are receiving this because you were mentioned.Message ID: @.***>

Oct 11 '22 06:10 paciorek

@paciorek Thanks for the feedback and the suggestion.

Oct 11 '22 14:10 danielturek

@perrydv @paciorek I’ve worked through our discussion of the posterior predictive revamp. Big picture, I’m satisfied with where this is at. I welcome any degree of comments or code review.

Below tries to describe the updated changes made in this PR. Please read it when you have some available mental bandwidth.

Any (former) notion of “posterior predictive model nodes” has been expunged.

There is a new definition of “predictive nodes”. Predictive nodes now are any stochastic model nodes, which are not data, and have no downstream data nodes.

There is the new name of “predictive root nodes” (formerly called the “branch nodes”). These are tracked in the model, but only because it requires zero additional processing to find them - they are determined during the course of finding “predictive nodes” anyway, so I kept track of these “root” nodes also. The “predictive root” nodes are not used anywhere, but could be in the future - the node IDs are tracked in the model.

There is a new system option called determinePredictiveNodesInModel (# 1), with default TRUE. Setting this to FALSE will disable the routine to determine predictive nodes in a model, in case this causes a computational bottleneck. In the case of FALSE, no model nodes are flagged as predictive or predictive root nodes.

There is a new system option called getDependenciesIncludesPredictiveNodes (# 2), with default value of TRUE. See next paragraph.

getDependencies has two new arguments:

includePredictive, with default value getNimbleOption('getDependenciesIncludesPredictiveNodes') - itself with default of TRUE, so the default behaviour of getDependencies is to include predictive nodes, the same as before
predictiveOnly, with default value FALSE

getNodeNames has two new arguments:

includePredictive, with default value TRUE
predictiveOnly, with default value FALSE

There is a new system option called MCMCusePredictiveDependenciesInCalculations (# 3), with default value of FALSE. As the name suggests, this controls whether predictive node dependencies are used in internal MCMC sampler calculations. The operates by, prior to building samplers in the setup code of buildMCMC, the value of the system option getDependenciesIncludesPredictiveNodes is set to the current value of this new option MCMCusePredictiveDependenciesInCalculations. The original value of getDependenciesIncludesPredictiveNodes is restored, after sampler building is completed.

Care was taken to restore the original value of getDependenciesIncludesPredictiveNodes, even if/when sampler building errors out (an error occurs during the building of any sampler function, which then aborts execution of buildMCMC) prior to sampler building being completed. This was a corner case I didn't see coming, and in some instances (for example, a sampler being assigned to a node on which it cannot operate) this resulted in the value of getDependenciesIncludesPredictiveNodes remaining as FALSE, and incorrect operation of getDependencies thereafter.
However, when a sampler is assigned to a predictive node, this temporary change in the value of getDependenciesIncludesPredictiveNodes does not take place. The original value of getDependenciesIncludesPredictiveNodes is used. Thus, for example, if a RW sampler is assigned to a predictive node, the target node and its stochastic dependencies (all of which are predictive nodes) will all be used in the MH calculations of the RW sampler, regardless of the option MCMCusePredictiveDependenciesInCalculations.
Error trap: if a multivariate sampler is assigned jointly to both predictive and non-predictive nodes, and also the option MCMCusePredictiveDependenciesInCalculations is FALSE, then buildMCMC bails out and does not allow that case.

MCMC samplers no longer update the logProbs of any predictive nodes which were omitted from sampler calculations.

The posterior_predictive_branch sampler has been deprecated. The posterior_predictive sampler now fills that role, by simulating new values for all downstream dependencies, and updating the associated logProb values. Default MCMC configuration will assign posterior_predictive samplers to the minimal necessary set of predictive nodes.

The posterior_predictive sampler always includes predictive dependencies (for simulating and calculating), regardless of system options.
The posterior_predictive sampler takes care to not twice compute the values of dependent deterministic nodes (which could easily otherwise happen, by occurring in both of the calls to simulate and calculate).
The posterior_predictive sampler now checks (at build time) that it is assigned to a posterior predictive model node.

There is a new system option called MCMCusePosteriorPredictiveSampler (# 4), with default value of TRUE. This (closely) replaces the old option about using the posterior_predictive_branch_sampler, which no longer exists. If MCMCusePosteriorPredictiveSampler is FALSE, then the default MCMC configuration will not assign any posterior_predictive samplers. Such a sampler could still be assigned, however, using addSampler.

Internal to the MCMC configuration object, two new variables are maintained:

A character vector of all non-data stochastic model nodes which are not the target of any current samplers.
A character vector of predictive model nodes which are “downstream” of the target nodes of posterior_predictive samplers. That is, nodes which will be sampled by a posterior_predictive sampler, but are not the target node.

There is a new “Comments” section, optionally printed via the show method of the MCMC configuration object. The “Comments” section is only printed if one of the (currently only two) possible comments will exist:

A [Note] telling a user if there are additional downstream nodes being sampled by a posterior_predictive sampler, which are not otherwise displayed as target nodes.
A [Warning] message if there are non-data stochastic model nodes which are not sampled. The message contains the number of such nodes, and suggests a method (getUnsampledNodes) for printing those node names. (This warning message is also printed, if applicable, at the time of buildMCMC).

There is a new system option called MCMCwarnUnsampledStochasticNodes (# 5), with default value of TRUE. If FALSE, then the aforementioned warning is suppressed.

Added testing for most of the above.

Updated NEWS.

@paciorek minor change to corner case testing of sampler_CRP, which is made in test-bnp.R, where it checks that the cluster nodes and the CRP node have the same set of stochastic dependents. This test failed, because the "extra" dependent introduced in the test case was excluded from the relevant node set by getDependencies, since it was a predictive node. So I made the "extra" dependent to be data (in the test case), so it is not excluded by getDependencies and still throws the relevant error (as the test routine expects).

Oct 15 '22 00:10 danielturek

nimble nimble copied to clipboard

Posterior predictive revamp

nimble
nimble copied to clipboard