get.parameter.samples: Joint sampling needs DB, multiple input files not well handled
Two related issues in get.parameter.samples:
No joint sampling without DB
Expected behavior: Joint sampling from any MCMC file, independent sampling from any post.distns (since these have no mechanism for recording covariance anyway).
Observed behavior: Parameters are sampled independently of each other unless all of these are true:
- bety connection available
- and settings$pft contains a valid posteriorid
- whose bety record points to a trait.mcmc file
- whose filename contains "mcmc.pda"
In all other cases, even when reading from an MCMC file, internal flag independent is set to TRUE and leads to calling get.ensemble.samples() with param.names = NULL (i.e. "please sample the following parameters jointly: None of them").
The get.ensemble.samples code contains several instances of an internal ma.results flag that I suspect was intended to serve the same purpose, but they are not currently used.
(As a side note, `get.ensemble.samples(..., param.names=NULL) being the expected spelling of "give me independent samples from all these parameters" is itself very confusing! But get.parameter.samples calls it correctly as documented and gets back the result it asked for.)
Proposed fix: Replace both independent and ma.results with a more explicit flag sample_jointly, set to TRUE when reading from MCMC and FALSE otherwise.
Parameters in MCMC ignored if not in prior
Expected behavior: either get.parameter.samples uses the parameters defined in exactly one of (in decreasing order of preference) trait.mcmc.Rdata, post.distns.Rdata, prior.distns.Rdata; or get.parameter.samples uses all parameters defined in any of those files, taking each parameter from the highest-preference file it appears in.
Observed behavior: Exactly one of post.distns.Rdata or prior.distns.Rdata is read as the prior, then trait.mcmc.Rdata is read if it exists, and logger output states that each pft "has MCMC samples for" all the parameters found in the MCMC file and that it "will use prior distributions for" all parameters found in the prior but not the MCMC. But the subsequent sampling loop is done over the names of the priors, meaning it silently skips any parameters that are present in the MCMC file but not in the prior. parameters present in both are taken from the MCMC file, as expected.
Proposed fix: If we want to use exactly one file, update file-reading logic and logger messages to match and use only that file's param names in the sampling loop. Or, if we want to use all parameters defined in any file, update sampling loop to operate over the union of all names found in prior.distns, post.distns, or MCMC output.
This issue probably has some overlap with #3662, which contemplates a change in how we specify the paths to each of these files.
a more explicit flag sample_jointly, set to TRUE when reading from MCMC and FALSE otherwise.
I'd propose "set TRUE by default, but sample marginally when insuffient information is provided to sample jointly". i.e. I'd like to future proof this against the foreseeable case where we analytically specify joint distributions with covariances
update sampling loop to operate over the union of all names found in prior.distns, post.distns, or MCMC output.
This would be what I suggest as the default, but I'd also suggest adding the option to specify a set of ignored parameters as a way to easily fix parameters at their defaults without having to go in and manually remove them from files. This would be compatible with the typical calibraiton workflow of (1) start with wide priors on lots of paramters (2) use trait data to update (3) perform SA to determine influential parameters, (4) fix noninfluential parameters at their means or medians (5) calibrate remaining parameters
sample marginally when insufficient information is provided to sample jointly
I like this concept but will need more detail to implement it.
option to specify a set of ignored parameters
This sounds like a new argument to get.parameter.samples, which is easy enough, but probably worth some thought on how it would be propagated from upstream (e.g. run.write.configs)
- Joint vs marginal: My recommendation for now would be just to have this as a parameter/flag that defaults to joint and then when we get to loading a prior or posterior table and find that things are set to "joint" that we (a) leave future selves a note inline about the need to support joint distributions and (b) send a message to logger saying something like "parameter X is not available jointly, sampling marginally"
- ignored parameters: Yep, I was envisioning this would be an addition to settings that would need to be passed as an optional argument.
Design for ignored parameters should also consider #3554, which wishes for similar param-by-param control over using values from IC files. These will probably be implemented in separate places but should work the same way.
@infotroph Thanks for digging into this! The proposed solution makes sense to me - union of all parameters, with priority mcmc > posterior > prior.