CUPiD icon indicating copy to clipboard operation
CUPiD copied to clipboard

Improvements to single-variable time series generation

Open TeaganKing opened this issue 1 year ago • 3 comments

There are a few outstanding comments from #78 that are being moved to a new issue so that we can try to get the basic timeseries functionality merged in. Please see that PR for more details/context on these comments!

  • [ ] From @wwieder , "I wonder if a more generic approach [than using lev] may be to copy all the coordinate dimensions from a history file onto the single variable time series?"
  • [ ] Ensure code is generic enough for different history file types (e.g. patch or landunit level output from CLM, as opposed to grid cell averages)
  • [ ] vars_to_derive might be updated when https://github.com/NCAR/ADF/pull/278 comes in. From @nusbaume, "we could potentially use the sympy package to allow users to write their own equations for derived variables without having to write new python code for each individual derived quantity."
  • [ ] from @TeaganKing, "Each component will need to adjust the variables they want to generate (unless processing all vars), the relevant history string, and the height dimension they are using (lev is currently the default) in config.yml.
  • [ ] Process_all should be updated; I think there's some issue going on with vars not being available in all files; this might have to do with needing to change hist_string.
  • [ ] We also want to include an alternative to num_procs so that we don't surprise (especially non-NCAR) users with a somewhat hidden request for a particular number of processors
  • [ ] In order to manage which timeseries scripts we run (for which components), we can use the compute_scripts feature. This will also allow us to specify that timeseries should be run first. For details, see comments from @rmshkv in #88 and #78. @mnlevy1981 and I had also discussed including a --timeseries-only flag in order to run timeseries without running the notebooks, and renaming --timeseries to --timeseries-first to be clear that that particular flag is for running timeseries AND diagnostics notebooks. However, I think that the using the compute_scripts ploomber capabilities would be ideal in this case.
  • [ ] Include **timeseries_params and use something like if type(arg1) == str: arg1 = [arg1] in timeseries.py instead of run.py in order to implement kwargs.
  • [ ] #90
  • [ ] from @wwieder Can we include additional data for some components? For example having the information about area and landfrac makes calculating global or regional sums easier.

TeaganKing avatar Mar 18 '24 22:03 TeaganKing

We'd also like to use the case name from the global parameters rather than specifying a case within the timeseries parameters to avoid duplication.

TeaganKing avatar Jun 05 '24 21:06 TeaganKing

Also, the timeseries by default puts the post-processed files in $case/$comp/proc/tseries. This should also include the frequency, so month_1 for example.

dabail10 avatar Dec 06 '24 15:12 dabail10

Just an additional note on the 'process_all' thing. I get the following message with the sea ice history files. I think it is the grid variables. I think it is correctly generating the timeseries for all variables though.

ncrcat: ERROR no variables fit criteria for processing ncrcat: HINT Extraction list must contain a record variable to concatenate. A record variable is a variable defined with a record dimension. Often the record dimension, aka unlimited dimension, refers to time. To change an existing dimension from a fixed to a record dimensions see http://nco.sf.net/nco.html#mk_rec_dmn or to add a new record dimension to all variables see http://nco.sf.net/nco.html#ncecat_rnm

dabail10 avatar Apr 22 '25 19:04 dabail10