CTSM
CTSM copied to clipboard
Transient single-point dataset capability for subset_data
In moving from mksurfdata.pl to using subset for single point datasets, one capability we have removed is the ability to do transient single-point datasets. We have this in place for smallville testing of dynamic landunits and we have one test for a tower site with transient landuse changes.
This relates to:
#1664
Definition of done:
- [x] Implement a simple bash script to do this on the ctsm5.2 branch
- [x] Add to the Makefile on the ctsm5.2 branch
- [ ] Add this as a capability into subset_data
- [ ] Investigate: Smallville hist tests in the ctsm5.2 branch https://github.com/ESCOMP/CTSM/pull/2372 indicate that pct_nat_pft for 1850 is inconsistent between the fsurdat and landuse files generated by subset_data
This is potentially a useful feature to maintain, although I suspect it's not used often.
Can a user still run a generic single point case with global DATM inputs and the global land use time series? If so, this may provide enough functionality for the majority of cases?
Would being able to subset the land use time series file for a single grid provide additional flexibility for users to configure their own specific land use time series?
Are their urban configuration, especially with transient urban now enabled, that would be helpful @olyson?
One more note, wasn't this something that @swensosc 's old subset_data script had a simple way of doing this that can be brought over into @negin513 's new script? https://github.com/ESCOMP/CTSM/blob/980b655984b366ba476c5d3f612374622f2e853a/tools/contrib/subset_surfdata#L120
Again, this may not be high priority, but we can discuss at our CLM meeting next week in conjunction with a broader datasets conversation.
You can use subset_data to subset from your landuse.timeseries file, but that's only going to work if you aren't overriding the PFT's for your data (same with the original subset_surfdata you show above). We have two sites we do that for: 1x1_numaIA and 1x1_brazil (so we can keep those two working). The issue with that is that in a normal tower site case you do override the PFT's, so the current transient capability is only going to work if you can happen to find a point from a global dataset that has the right PFT's to begin with. Possibly using a higher resolution global grid will help, but you are still likely to get a mix of PFT's even at our finest resolutions.
For the smallville site we constructed specific landuse changes that happened over a few years. We don't have any capacity to construct transient changes like that now. PTCLM also had the ability to do transient for the US-Ha1 tower site. What it had was a harvest that happened at 1946, and that's all the transient timeseries file does is add that one harvest. But, that is a useful feature. Again that's the kind of thing that you have to construct rather than use the global landuse.timeseries file.
For urban and in general, single-point transient capability would be useful for troubleshooting problems encountered in global or regional simulations. I can't think of any specific urban use cases other than that.
One more note, wasn't this something that @swensosc 's old subset_data script had a simple way of doing this that can be brought over into @negin513 's new script?
https://github.com/ESCOMP/CTSM/blob/980b655984b366ba476c5d3f612374622f2e853a/tools/contrib/subset_surfdata#L120
Again, this may not be high priority, but we can discuss at our CLM meeting next week in conjunction with a broader datasets conversation.
Hello!
I think this capability already exists in subset_data script using --create-landuse
option.
For example, if you run this, it will create the landuse file:
./subset_data point --create-landuse --include-nonveg --verbose
The code corresponding to it is here: https://github.com/ESCOMP/CTSM/blob/157f7192090267c4f65aaf5e31abfb45a16a4a22/python/ctsm/subset_data.py#L463-L468
and here:
https://github.com/ESCOMP/CTSM/blob/157f7192090267c4f65aaf5e31abfb45a16a4a22/python/ctsm/site_and_regional/single_point_case.py#L338-L395
@negin513 yes as I say above the ability to subset landuse.timeseries files exists. But, it's not going to function in a useable way if you are over-ridding the PFT's for the site. The landuse timeseries file from the global dataset is going to have a different PFT distribution that won't line up with what you want to override it with. To both override the PFT's and allow a transient change in time, there needs to be a mechanism to not only override the PFT's, but also give how it's going to change in time. And you also might want to specify the harvest for each year as well. I say this elsewhere -- we can use this capability for some specific sites: 1x1_numaIA, and 1x1_brazil (since we don't override the PFT's there). We can't use it for constructed transient changes like we do for 1x1_smallvilleIA and 1x1_US-Ha1. To catch the joke, Smallville IA is the place where Superman was raised, so it's not a real place, the PFT's and transient changes are completely made up. :-)
Oh! I see what you mean here. Thanks for clarifying it.
But, it's not going to function in a useable way if you are over-ridding the PFT's for the site. The landuse timeseries file from the global dataset is going to have a different PFT distribution that won't line up with what you want to override it with.
- So possibly we should print out a warning/or error if the user specify
--dompft
and--create-landuse
at the same time?
To both override the PFT's and allow a transient change in time, there needs to be a mechanism to not only override the PFT's, but also give how it's going to change in time. And you also might want to specify the harvest for each year as well.
- I understand now. That is an interesting idea. We probably want to think about the best way to implement this feature if possible.
To catch the joke, Smallville IA is the place where Superman was raised, so it's not a real place, the PFT's and transient changes are completely made up. :-)
Haha! I did not know about Smallville. I kept thinking it is a real place. :smile:
In the ctsm SE meeting, we have decided on the following format for now:
year pft pft_weight
1950 1, 5 0.5, 0.5
1951 18, 22 0.3, 0.7
1952 18, 25 0.3, 0.7
An additional feature would be filling in the years if they don't exist using the previous line.
Thanks, @negin513 - that looks like a very good format. One minor detail is that I'd probably get rid of the spaces within a given area (like the 1, 5
) and allow any mix of whitespace (spaces or tabs) in between the areas. That would let you do an initial split on whitespace, followed by a split on commas.
@erik - I'm not sure what you mean by having the mksurfdata_esmf Makefile build single point datasets. Currently CMake is being used - and a temporary Makefile exists just to build the executable. This Makefile will disappear once the next ESMF release takes place and we can point to a stable ESMF library. I'd like to understand this requirement in more detail - maybe in a meeting with @slevis and @ekluzek .
This is not a requirement for mksurfdata_esmf it's a requirement for the subset_data tool.
@ekluzek - thanks for clarifying. That makes sense.
We talked about this some at the CTSM software meeting this morning as this is needed to create single-point transient datasets. @negin513 and I are meeting on this tomorrow.
@negin513 and I met on this, and she has more comments coming. We worked out the UI for how this should work. She will also do the work needed for this,. There is only one file that we need for this, for surface dataset generation, so we can wait on it for later.
@wwieder this was something that Negin was going to do, but obviously can't now. This is important for the CTSM5.2 in that there is one test dataset that needs this capability. Keeping this testing is important long term, but we maybe don't need to hold CTSM5.2 for it. I haven't looked into how long this would take to accomplish. But, do you have thoughts on if we should make it a requirement for CTSM5.2 or wait until post CTSM5.2?
My feeling is that we want to have the capability long-term, but if it isn't in place for CTSM5.2 we can probably pretty easily put together the needed transient dataset(s) through a manual / one-off process.
I agree with @billsacks here, this is something we want long term, but that doesn't need to hold up the CTSM5.2 development (or release). We can create the dataset needed for testing, with the understanding that at some point users will request this functionality with a modern code base.
Should we close this issue with a 'won't fix' label (for now) or leave it open?
@wwieder let's leave it open although I will put low priority for now. I needed to know what the plan for it was to know if it was something that CTSM5.2 should be held up for. So I'll adjust the CTSM5.2 project board as well.
As a way to do this in the short term I'm going to initially implement this with a simple bash script using NCO and the older file:
See
https://github.com/ESCOMP/CTSM/issues/1869#issuecomment-1692267883
Our current plan is for @slevis-lmwg do the first step of doing this in a bash script.
...sorry for my confusion about the card associated with this issue. I put it back where I found it.
After talking to @slevis-lmwg we wanted to clarify the scope for this issue.
- Scientifically, I'd imagine that users can point to a global land use timeseries for a single point simulation that runs OK (we're already pointing global lightening streams data). This make me think the focus of this issue is more related to
- Software testing (e.g., making sure we have smallville and tower tests that are working with CLM5 surface data).
@ekluzek can you help clarify if this assessment is accurate. If so, how critical is this capability before we bring in the CTSM5.2 tag?
@wwieder unfortunately I think this is important for our testing, and so critical to do. If it was just regular transient time-series it probably wouldn't be a big deal. But, this is how we test both transient-lake and transient-urban. See the test directories that use smallville...
smallville_dynlakes_monthly/user_nl_clm:flanduse_timeseries = '$DIN_LOC_ROOT/lnd/clm2/surfdata_map/landuse.timeseries_1x1_smallvilleIA_hist_78pfts_simyr1850-1855_dynLakes_c200928.nc'
smallville_dynurban_monthly/user_nl_clm:! The flanduse_timeseries file was created with the following NCL script (a copy of this script is in cime_config/testdefs/testmods_dirs/clm/smallville_dynurban_monthly):
smallville_dynurban_monthly/user_nl_clm:!flanduse_timeseries = '$DIN_LOC_ROOT/lnd/clm2/surfdata_map/landuse.timeseries_1x1_smallvilleIA_hist_78pfts_simyr1850-1855_dynUrban_c220223.nc'
smallville_dynurban_monthly/user_nl_clm:flanduse_timeseries = '$DIN_LOC_ROOT/lnd/clm2/surfdata_map/landuse.timeseries_1x1_smallvilleIA_hist_78pfts_simyr1850-1855_dynUrban_c220223.nc'
So this is important software testing, but also important testing of scientific features we need to keep working.
However, as I write this I realize that CTSM5.2 have transient lake and urban already in. So actually maybe we could remove those two tests (or modify them to do this with global datasets)?
It's probably OK to not have a single point test of transient flanduse_timeseries files, I'm pretty confident that is likely to be OK. Although long term we still want this capability.
So possibly the task is to make sure we have tests that ensure transient lake and urban are working? There could also be tests to make sure you can turn just those features on.
@wwieder and @slevis-lmwg what do you think?
OK, so the issue is really focused on testing. This will help us decide that prioritization for @slevis-lmwg to do this.
For testing purposes, it seems like this can kind of be a one-off, we just need a tool to crates the land use time series for point simulations that exercise lake, urban, (and other) features?
I agree that testing is important and will defer to you, Sam and @olyson about the best way to ensure good testing coverage for transient features we want to support with the CTSM5.2 dataset.
-
In
/glade/work/slevis/git/mksurfdata_toolchain/cime_config/testdefs/testmods_dirs/clm/smallville_dynlakes_monthly
follow this order: a) subset_data to generate landuse.nc for smallville by picking the correct lat/lon. From Erik's makefile:SUBSETDATA_1X1_SMALL := --lat 40.6878 --lon 267.0228 --site 1x1_smallvilleIA
b) trim output global file to 1850-1855 (can subset_data do that for me?) c) ncap2 -s PCT_LAKE=array(0.0,0.0,PCT_CROP); PCT_LAKE={0.,50.,25.,25.,25.,25.}; HASLAKE=array(1.,1.,AREA); PCT_CROP=array(0.0,0.0,PCT_LAKE); PCT_CROP={0.,25.,12.,12.,12.,12.} landuse.timeseries_1x1_smallvilleIA_hist_78pfts_simyr1850-1855_cNEW_FILE.nc landuse.timeser ies_1x1_smallvilleIA_hist_78pfts_simyr1850-1855_dynLakes_cNEWEST_FILE.nc" -
Repeat for smallville_dynuban_monthly and likely a 3rd case: c) The ncap command will differ.
Make reproducible by placing in a script (bash) and test by running the smallville tests from testlists. Do this to test on derecho:
git show ctsm5.1.dev158:Externals.cfg > Externals.cfg
manage_externals/checkout_externals
Update
- a) In /glade/work/slevis/git/mksurfdata_toolchain/tools/site_and_regional, I executed:
./subset_data point --lat 40.6878 --lon 267.0228 --site 1x1_smallvilleIA --create-surface --create-landuse --crop
and got this file (in subdirectory /subset_data_single_point):landuse.timeseries_1x1_smallvilleIA_hist_78_CMIP6_1850-2015_c240103.nc
b) subset_data cannot trim the file, so I used ncks:ncks -d time,0,5 landuse.timeseries_1x1_smallvilleIA_hist_78_CMIP6_1850-2015_c240103.nc landuse.timeseries_1x1_smallvilleIA_hist_78_CMIP6_1850-1855_c240103.nc
c)ncap2 -s "PCT_LAKE=array(0.,0.,PCT_CROP); PCT_LAKE={0.,50.,25.,25.,25.,25.} ; PCT_LAKE_MAX=array(50.,50.,PCT_CROP_MAX); PCT_CROP=array(0.,0.,PCT_LAKE); PCT_CROP={0.,25.,12.,12.,12.,12.}; PCT_CROP_MAX=array(25.,25.,PCT_LAKE_MAX)" landuse.timeseries_1x1_smallvilleIA_hist_78_CMIP6_1850-1855_c240103.nc landuse.timeseries_1x1_smallvilleIA_hist_78pfts_1850-1855_dynLakes_c240103.nc
- c)
ncap2 -s "PCT_URBAN=array(0.,0.,PCT_URBAN); PCT_URBAN={0.,0.,0.,20.,15.,0.,10.,8.,0.,10.,8.,0.,10.,8.,0.,10.,8.,0.} ; PCT_URBAN_MAX=array(0.,0.,PCT_URBAN_MAX); PCT_URBAN_MAX={20.,15.,0.}; PCT_CROP=array(0.,0.,PCT_LAKE); PCT_CROP={0.,25.,12.,12.,12.,12.}; PCT_CROP_MAX=array(25.,25.,PCT_LAKE_MAX)" landuse.timeseries_1x1_smallvilleIA_hist_78_CMIP6_1850-1855_c240103.nc landuse.timeseries_1x1_smallvilleIA_hist_78pfts_1850-1855_dynUrban_c240103.nc
I have not found a 3rd smallville case to address if there is one.
To make the steps reproducible, I created this script:
/glade/work/slevis/git/mksurfdata_toolchain/tools/modify_input_files/modify_smallville_w_dynurban_and_lake.sh
I updated Externals.cfg to dev159, ran ./manage_externals/..., and updated the 2 user_nl_clm files in the smallville testmod directories. The smallville tests PASS:
./create_test ERS_Lm25.1x1_smallvilleIA.IHistClm50BgcCropQianRs.derecho_gnu.clm-smallville_dynlakes_monthly
./create_test ERS_Lm25.1x1_smallvilleIA.IHistClm50BgcCropQianRs.derecho_gnu.clm-smallville_dynurban_monthly
@slevis-lmwg it looks like we removed testing for this dataset. I'll look into that some more...
Here is the previous file that was used:
/glade/campaign/cesm/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/landuse.timeseries_1x1_smallvilleIA_hist_78pfts_CMIP6_simyr1850-1855_c190214.nc
Note, it says 1850-1855, but it's really a constructed file that exercises specific landuse transitions in those 5 years. So it covers all the type of changes in a short test.
Look into creating that file and if it's easy enough we could add it back into our testing.
The info from the above landuse file that needs to be replicated. Each line is a year (1850-1855):
input_pftdata_filename =
"<pft_f>100</pft_f><pft_i>13</pft_i><harv>0,0,0,0,0</harv><graz>0</graz>",
"<pft_f>100</pft_f><pft_i>13</pft_i><harv>0,0,0,0,0</harv><graz>0</graz>",
"<pft_f>1,1,1,1,1,1,1,1,1,91</pft_f><pft_i>15,16,17,18,19,20,21,22,23,24</pft_i><harv>0,0,0,0,0</harv><graz>0</graz>",
"<pft_f>91,1,1,1,1,1,1,1,1,1</pft_f><pft_i>15,16,17,18,19,20,21,22,23,24</pft_i><harv>0,0,0,0,0</harv><graz>0</graz>",
"<pft_f>50,1,2,2,3,3,4,4,5,5,21</pft_f><pft_i>13,15,16,17,18,19,20,21,22,23,24</pft_i><harv>0,0,0,0,0</harv><graz>0</graz>",
"<pft_f>75,1,1,1,1,1,1,1,1,1,16</pft_f><pft_i>13,15,16,17,18,19,20,21,22,23,24</pft_i><harv>0,0,0,0,0</harv><graz>0</graz>" ;