pecan icon indicating copy to clipboard operation
pecan copied to clipboard

Avoid overloading `pft$outdir` as an input path

Open infotroph opened this issue 1 month ago • 6 comments

Bug Description

Context: get.parameter.samples expects two paths for each PFT:

  • pfts$<pft>$posterior.files is a path including filename (which is typically post.distns.Rdata inside a PFT-specific directory) for an Rdata file containing the parameters characterized by named distributions
  • pfts$<pft>$outdir is a path to a directory to look in for MCMC results (with the filename hard-coded as trait.mcmc.Rdata) from the parameters whose distribution is not easily named.

Bug: Both of these paths are inputs to get.parameter.samples, so needing to specify the MCMC location in outdir is confusing and (best I can tell) undocumented. It could make sense in the context of a workflow that always runs meta-analysis and therefore always recreates these files, but given my posteriors are generated in a separate offline calibration process I want to store them in a path that is separate from the easily-recreated workflow outputs. Since pfts$<pft>$outdir is also used by run.sensitivity.analysis (and possibly others) to write their own outputs, as far as I can tell there's currently no way to pass MCMC files from a readonly location -- my [gollum voice] preciousssss input artifacts (or copies/links to them) have to be mixed in with outputs.

Possible fixes

My preference would be to have separate settings for each input and output, and to specify them in the block for each step rather than together in the pft block. Where a file is updated by multiple steps it may make more sense to have a setting that identifies it by purpose in a way that makes sense both for steps that take it in and put it out:

  • get.parameter.samples reads the MCMC file from pfts$<pft>$posterior.mcmc.file and the distribution file from pfts$<pft>$posterior.distribution.file (or possibly from pfts$<pft>$posterior.files$mcmc and pfts$<pft>$posterior.files$distribution respectively)
  • Meta-analysis takes its output directory from a setting in the meta.analysis block rather than the pfts block, possibly updating the pfts block to point to the relevant files after writing them
  • sensitivity analysis takes its output directory from a setting in the sensitivity.analysis block, separating PFTs as subdirs within the specified path. It does not update the pfts block because none of its outputs change how downstream steps use a given PFT (right?)

infotroph avatar Nov 14 '25 21:11 infotroph

I'd lean towards having the posterior files stay at pfts$<pft>$posterior.files and update the code that reads these to be smart enough to distinguish a mcmc file from a post.distns.Rdata. pfts$<pft>$outdir should say as the folder used by the meta-analysis and calibration code for writing output.

I don't think the meta-analysis or sensitivity analyses need their own separate outdirs because they shouldn't be writing files with the same names, and the sens analysis shouldn't modify the meta-analysis (or calibration) parameters

mdietze avatar Nov 14 '25 21:11 mdietze

Implemented in branch: fix/3662-avoid-pft-outdir-as-input

• get.parameter.samples no longer relies on pft$outdir for inputs when explicit paths are provided. • posterior.files now supports list-of-lists per PFT: distribution (post.distns/prior.distns) and mcmc (trait.mcmc*.Rdata). • Backward compatible; same fallbacks if not provided.

shbhmexe avatar Nov 17 '25 12:11 shbhmexe

@shbhmexe Thanks for the PR, but I don't know that we've converged on a spec for the fix yet.

@mdietze 👍 to taking both distns and mcmc from posterior.files. I'm picturing keeping it as a single path in the XML, and having the code first check if it's a directory (to look in for both post.distns.Rdata and trait.mcmc.Rdata) or a single file ( to treat as the path to post.distns). I'm agnostic whether to proceed without trait.mcmc if it's not found in the posterior.files or to also look in outdir for backward compatibility.

I'm less convinced about keeping pfts$<pft>$outdir for new analyses: Happy to stipulate that MA and SA shouldn't be producing file collisions and I see the logic for treating MA as having one canonical result per PFT. But since one PFT can produce wildly different SA results depending on non-PFT factors (climate, site conditions), my instinct has always been to think the results should be "owned by" the SA and then be confused that their path is defined in pfts instead of in sensitivity.analysis.

infotroph avatar Nov 19 '25 17:11 infotroph

@infotroph I get the point about SA not mapping to pfts now that we're doing multi-site SA. I'm OK with the proposed restructuring to have a directory (or directory hierarchy) for SA outputs. I'm assuming the MA and calibration would stay with the pft$outdir?

mdietze avatar Nov 19 '25 18:11 mdietze

MA and calibration would stay with the pft$outdir?

@mdietze Yes for this change, just to avoid scope creep:

  • For MA. I could see putting the outdir in meta.analysis for symmetry with SA, but I don't think it makes a big functional difference
  • PDA has enough moving pieces (and I see it rewrites pft$outdir as it runs) that I'll defer to folks who've used it more.

infotroph avatar Nov 19 '25 19:11 infotroph

On surprise factor, see also #3131

infotroph avatar Nov 25 '25 18:11 infotroph