CDEPS icon indicating copy to clipboard operation
CDEPS copied to clipboard

Adding capability to use temporally subsetted stream files

Open uturuncoglu opened this issue 5 years ago • 1 comments

The temporally subsetted stream files need to be handled by the CDEPS for especially high resolution input datasets such as ERA5. To test the capability in the current version of the CDEPS, I tested two different scenarios. To do that, i modified datm.streams.xml to include the only subsetted files,

<stream_data_files> /glade/p/cesmdata/cseg/ufs_inputdata/atm/datm7/ERA5/ERA5.TL639.2019.08.200618_subset.nc /glade/p/cesmdata/cseg/ufs_inputdata/atm/datm7/ERA5/ERA5.TL639.2019.09.200618_subset.nc </stream_data_files>

and then i performed following tests;

CASE 1: I created a subsetted data based on the simulation RUN_STARTDATE. So, the data start exactly from RUN_STARTDATE and covers entire simulation period.

It seems CDEPS is working in this case and picking correct index (1) from the file for the first time step.

CASE 2: In this case, i generated a subsetted file that starts an earlier date than simulation RUN_STARTDATE i.e. 2019-08-28 and again covers entire simulation period. In this case, the first data read from input file need to be belong to index 24 (for 2019-08-29) because the data file starts 1 day earlier from simulation RUN_STARTDATE in this test.

In this case, CDEPS is not picking correct date from the file and still tries to get the field from index 1 rather than index 24. I think this is due to assuming having data for entire year (via start end end years in the XML definition) and fixed time interval. To have more generic implementation, CDEPS need to look at the dates (generated by using dataset time dimension - time unit, calendar etc.) not indexes.

uturuncoglu avatar Sep 07 '20 04:09 uturuncoglu

@mvertens @jedwards4b @rsdunlapiv i was thinking about the issue recently and I think there are couple of solutions to make it work. Here are my suggestions;

1 - While it is hard to manage all different calendars and time units to convert time indexes in the netCDF file to their date representation. The zero order approach could be creating a text file along with the data file to store date information such as following,

2019-08-01T00:00:00
2019-08-01T01:00:00
2019-08-01T02:00:00
2019-08-01T03:00:00
2019-08-01T04:00:00
2019-08-01T05:00:00
2019-08-01T06:00:00
2019-08-01T07:00:00
2019-08-01T08:00:00
2019-08-01T09:00:00
2019-08-01T10:00:00
2019-08-01T11:00:00

This kind of list can be generated by simple CDO command cdo -s showtimestamp and the CDEPS looks for this file to find the indices for the requested date. We don't need to create text file for every stream files but if CDEPS found it in the data directory, it could try to use it. So, the default behavior of the CDEPS will not change.

2 - The main drawback of implementation (1) is that we need to manage an external text file for each stream file and it might be not the right direction for the future. Rather than creating an additional text file we could store this information inside of the stream file (as a new netcdf variable or as a part of the global attributes) but this also requires to modify the stream file, which we might not want.

3 - Developing a fortran interface to generate the dates and match with the data files. I am not sure about the existing implementations and if you know one we could try to use it under CDEPS but as I already mentioned that it could be hard to support all the different cases with this approach.

Anyway, let me know what do you think? I am already using 1st approach to subset ERA5 files and example script can be seen in the /glade/p/cesmdata/cseg/ufs_inputdata/atm/datm7/ERA5/subset.sh and it works fine and it could be easy to implement the similar approach to find the matching indices in the CDEPS.

uturuncoglu avatar Sep 23 '20 17:09 uturuncoglu