Avoid overloading `pft$outdir` as an input path
Bug Description
Context: get.parameter.samples expects two paths for each PFT:
pfts$<pft>$posterior.filesis a path including filename (which is typicallypost.distns.Rdatainside a PFT-specific directory) for an Rdata file containing the parameters characterized by named distributionspfts$<pft>$outdiris a path to a directory to look in for MCMC results (with the filename hard-coded astrait.mcmc.Rdata) from the parameters whose distribution is not easily named.
Bug: Both of these paths are inputs to get.parameter.samples, so needing to specify the MCMC location in outdir is confusing and (best I can tell) undocumented. It could make sense in the context of a workflow that always runs meta-analysis and therefore always recreates these files, but given my posteriors are generated in a separate offline calibration process I want to store them in a path that is separate from the easily-recreated workflow outputs. Since pfts$<pft>$outdir is also used by run.sensitivity.analysis (and possibly others) to write their own outputs, as far as I can tell there's currently no way to pass MCMC files from a readonly location -- my [gollum voice] preciousssss input artifacts (or copies/links to them) have to be mixed in with outputs.
Possible fixes
My preference would be to have separate settings for each input and output, and to specify them in the block for each step rather than together in the pft block. Where a file is updated by multiple steps it may make more sense to have a setting that identifies it by purpose in a way that makes sense both for steps that take it in and put it out:
get.parameter.samplesreads the MCMC file frompfts$<pft>$posterior.mcmc.fileand the distribution file frompfts$<pft>$posterior.distribution.file(or possibly frompfts$<pft>$posterior.files$mcmcandpfts$<pft>$posterior.files$distributionrespectively)- Meta-analysis takes its output directory from a setting in the
meta.analysisblock rather than thepftsblock, possibly updating thepftsblock to point to the relevant files after writing them - sensitivity analysis takes its output directory from a setting in the
sensitivity.analysisblock, separating PFTs as subdirs within the specified path. It does not update thepftsblock because none of its outputs change how downstream steps use a given PFT (right?)
I'd lean towards having the posterior files stay at pfts$<pft>$posterior.files and update the code that reads these to be smart enough to distinguish a mcmc file from a post.distns.Rdata. pfts$<pft>$outdir should say as the folder used by the meta-analysis and calibration code for writing output.
I don't think the meta-analysis or sensitivity analyses need their own separate outdirs because they shouldn't be writing files with the same names, and the sens analysis shouldn't modify the meta-analysis (or calibration) parameters
Implemented in branch: fix/3662-avoid-pft-outdir-as-input
• get.parameter.samples no longer relies on pft$outdir for inputs when explicit paths are provided. • posterior.files now supports list-of-lists per PFT: distribution (post.distns/prior.distns) and mcmc (trait.mcmc*.Rdata). • Backward compatible; same fallbacks if not provided.
@shbhmexe Thanks for the PR, but I don't know that we've converged on a spec for the fix yet.
@mdietze 👍 to taking both distns and mcmc from posterior.files. I'm picturing keeping it as a single path in the XML, and having the code first check if it's a directory (to look in for both post.distns.Rdata and trait.mcmc.Rdata) or a single file ( to treat as the path to post.distns). I'm agnostic whether to proceed without trait.mcmc if it's not found in the posterior.files or to also look in outdir for backward compatibility.
I'm less convinced about keeping pfts$<pft>$outdir for new analyses: Happy to stipulate that MA and SA shouldn't be producing file collisions and I see the logic for treating MA as having one canonical result per PFT. But since one PFT can produce wildly different SA results depending on non-PFT factors (climate, site conditions), my instinct has always been to think the results should be "owned by" the SA and then be confused that their path is defined in pfts instead of in sensitivity.analysis.
@infotroph I get the point about SA not mapping to pfts now that we're doing multi-site SA. I'm OK with the proposed restructuring to have a directory (or directory hierarchy) for SA outputs. I'm assuming the MA and calibration would stay with the pft$outdir?
MA and calibration would stay with the pft$outdir?
@mdietze Yes for this change, just to avoid scope creep:
- For MA. I could see putting the outdir in
meta.analysisfor symmetry with SA, but I don't think it makes a big functional difference - PDA has enough moving pieces (and I see it rewrites pft$outdir as it runs) that I'll defer to folks who've used it more.
On surprise factor, see also #3131