rsofun icon indicating copy to clipboard operation
rsofun copied to clipboard

BiomeE interface and doc updates

Open fabern opened this issue 1 year ago • 6 comments

BiomeE output is by default a data.frame with a single nested column containing outputs for output_daily_tile output_annual_tile and output_annual_cohorts .

This could be made more tidy by using unnest_wider(), which unnests them to three separate, (still nested) columns. (For multiple sites potentially we'd need unnest_wider(simplify = FALSE).)

rsofun::biomee_gs_leuning_output |> unnest_wider(data)
# # A tibble: 1 × 4
#   sitename output_daily_tile output_annual_tile output_annual_cohorts
#   <chr>    <list>            <list>             <list>               
# 1 CH-Lae   <df [365 × 35]>   <df [251 × 59]>    <df [10 × 35]>       

rsofun::biomee_p_model_output |> unnest_wider(data) 
# # A tibble: 1 × 4
#   sitename output_daily_tile output_annual_tile output_annual_cohorts
#   <chr>    <list>            <list>             <list>               
# 1 CH-Lae   <df [365 × 35]>   <df [251 × 59]>    <df [5 × 35]>        

Should we either:

  • [ ] a) document how to postprocess the default output (non-breaking change)?
  • [ ] b) change the default BiomeE output (breaking change)?

If we opt for b) we could then further:

  • [ ] c) harmonize pmodel output by renaming the (single column) from data into output_daily_tile (breaking change)
rsofun::p_model_output |> rename(output_daily_tile = data) # This is breaking for p-model
# # A tibble: 1 × 3
#   sitename site_info        output_daily_tile
#   <chr>    <list>           <list>           
# 1 FR-Pue   <tibble [1 × 4]> <df [2,190 × 21]>

What do you think about these breaking changes, @marcadella @stineb?

fabern avatar Nov 28 '24 18:11 fabern

Further small gripes of mine. Idea is to change them, if we ever change the output data.frame. These are small hindrances that make it difficult for newcomers to start using the package.

p-model-output:

  • [x] at the moment nothing here

BiomeE-output:

  • [x] documentation of output data.frames of BiomeE contains output_daily_cohorts and output_hourly_tile (see here), but these two data.frames are never returned.
  • [x] output names of BiomeE are not consistent, e.g. Trsp vs Transp vs transp across the different output data.frames. This is documented here but also copied in the code block below. Better to make them consistent (and potentially add _units to column names?)
  • [x] Units documentation in BiomeE again here for daily and hourly output fluxes doesn't specify aggregation period. Is it per day and per hour? Or per second? (Yearly output specifies that.)
  • [ ] 'year' in BiomeE output refers to simulation year (1=start of spin-up), whereas input forcing and params_siml_biomee$firstyeartrend are specified in AD-years.
res_biomee$output_daily_tile[[1]] |> tibble() |> names() |> dput()
c("year", "doy", "Tc", "Prcp", "totWs", "Trsp", "Evap", "Runoff", 
"ws1", "ws2", "ws3", "LAI", "GPP", "Rauto", "Rh", "NSC", "seedC", 
"leafC", "rootC", "SW_C", "HW_C", "NSN", "seedN", "leafN", "rootN", 
"SW_N", "HW_N", "McrbC", "fastSOM", "slowSOM", "McrbN", "fastSoilN", 
"slowSoilN", "mineralN", "N_uptk")

res_biomee$output_annual_tile[[1]] |> tibble() |> names() |> dput()
c("year", "CAI", "LAI", "Density", "DBH", "Density12", "DBH12", 
"QMD12", "NPP", "GPP", "Rauto", "Rh", "rain", "SoilWater", "Transp", 
"Evap", "Runoff", "plantC", "soilC", "plantN", "soilN", "totN", 
"NSC", "SeedC", "leafC", "rootC", "SapwoodC", "WoodC", "NSN", 
"SeedN", "leafN", "rootN", "SapwoodN", "WoodN", "McrbC", "fastSOM", 
"SlowSOM", "McrbN", "fastSoilN", "slowSoilN", "mineralN", "N_fxed", 
"N_uptk", "N_yrMin", "N_P2S", "N_loss", "totseedC", "totseedN", 
"Seedling_C", "Seedling_N", "MaxAge", "MaxVolume", "MaxDBH", 
"NPPL", "NPPW", "n_deadtrees", "c_deadtrees", "m_turnover", "c_turnover_time"
)

res_biomee$output_annual_cohorts[[1]] |> tibble() |> names() |> dput() #select(cohort, year, cID) |> distinct() # TODO: what is column cohort?
c("cohort", "year", "cID", "PFT", "layer", "density", "flayer", 
"DBH", "dDBH", "height", "age", "BA", "dBA", "Acrown", "Aleaf", 
"nsc", "seedC", "leafC", "rootC", "sapwC", "woodC", "nsn", "treeG", 
"fseed", "fleaf", "froot", "fwood", "GPP", "NPP", "Rauto", "Nupt", 
"Nfix", "n_deadtrees", "c_deadtrees", "deathrate")

fabern avatar Dec 11 '24 14:12 fabern

  • Pt 1: What do you mean by "output_daily_cohorts and output_hourly_tile are occasionally left out"?
  • Pt 4: Biomee exclusively work with relative years (i.e. 1, 2, 3, ...). The parameter firstyeartrend acts as an offset to anchor the generated time series into the historical time scale (ex: 2013, 2014, 2015, ...). The thing is Biomee never uses this parameter so all the outputs remain relative any ways. I documented this better in 9d612.

marcadella avatar Dec 12 '24 09:12 marcadella

  • Pt 1: output_daily_cohorts and output_hourly_tile are documented, but are never returned.
  • Pt 4: yes exactly. At the same time the date column of the forcing is not used either. In the current situation the user could potentially think that firstyeartrend can be used to only output/simulate a subset of the forcing data.frame(). To simplify: I would suggest to remove this parameters firstyeartrend and simply re-use the first year of the forcing data.frame().

fabern avatar Dec 12 '24 13:12 fabern

  • Pt. 1: They should be removed from the doc
  • Pt. 2: Agreed.
  • Pt. 3: Not sure what you mean. daily outputs have day-1 where appropriate.
  • Pt. 4: I agree, but then date must be added in the doc for the driver (currently it is not a required field). In addition, the doc is wrong for tair and pair (which should be temp and patm).
  • Pt. 5: Regarding the main issue, it won't work with LULUC. See proposed output in #272 (section Multi LU output).
  • Pt. 6: Since we are at it, we could also deprecate spinup flag and fill it internally if not present with spinupyears > 0.

marcadella avatar Dec 13 '24 16:12 marcadella

  • Pt. 3: No they don't all have it. Carbon fluxes have it. But not water and nitrogen fluxes: i.e. Prcp, Trsp, Evap, Runoff, and N_uptk in output_daily_tile https://github.com/geco-bern/rsofun/blob/1dbfe715b4d1b83e04389393761ff6a9e80aeab5/R/run_biomee_f_bysite.R#L52-L56 (Same was valid for the output_hourly_tile.) I think output_annual_tile is complete, but could you please also double-check (and standardize year and yr while at it) ?
  • Pt. 4: Yes, we can add these three to the docs.
  • Pt. 5: I don't see how your suggested combined LUs would look like. Could you make a code example creating this data.frame()/tibble()? Not sure if one would exclude the other. Could we have nested data.frames() instead of nested lists()?
  • Pt. 6: Yes, I agree in doing this, too. Would this also be needed for pmodel?

fabern avatar Dec 16 '24 07:12 fabern

Pt. 5:

# # A tibble: 1 × 5
#   sitename primary     secundary     crop           pasture
#   <chr>    <list>      <list>        <list>         <list>  
# 1 CH-Lae   <*>         <*>           <*>            <*>  

where <*> is the list constituted of the three usual tibbles: output_daily_tile, output_annual_tile, output_annual_cohorts.

marcadella avatar Dec 16 '24 08:12 marcadella

Pt. 3 is fixed here: 6f128c03014c95df342fa7d6e5901c2dc34630f2 Pt. 2 is fixed here: 190fb94b7a8e68b3c684007f43c31db99d0680a8 Pt. 4: has not been implemented (date column of input forcing, etc.)

This issue will be closed, since the main request (from post 1) clashes with outputting land use change (LULUC).

fabern avatar Jun 10 '25 15:06 fabern