pecan
pecan copied to clipboard
mismatch between ED run period and outputs
a question for ED users - I'm giving a run from 2005/01/01
to 2006/12/31
and I'm having a couple of issues with the outputs:
- It writes out an
analysis-T-2004-00-00-000000-g01.h5
file which has 11 non-zero values in it. - The logfile says it starts the sim from
01/01/2005 00:00:00 UTC
and the last date printed out is12/30/2006 00:00:00 UTC
so it misses the whole12/31/2006
at the end -
analysis-T-2006-00-00-000000-g01.h5
file has17461
non-zero values in it whereas it's supposed to have17520
- This
17461
value is consistent with the missing day (24*2 = 48) plus the 11 value in the 2004 file (17461 + 48 +11 = 17520) -
2005.nc
file post-processed by pecan starts with values in theanalysis-T-2005-00-00-000000-g01.h5
file, ignoring the 11 values in the 2004 file - Finally,
2006.nc
file post-processed by pecan also ends with59
zero values.
Btw, all analysis-T-YYYY-00-00-000000-g01.h5
files have doy*24*2
values regardless how many actual values were computed for that year. ED just fills the rest with zeros.
These are the values in ED2IN:
NL%IMONTHA = 01
NL%IDATEA = 01
NL%IYEARA = 2005
NL%ITIMEA = 0000
NL%IMONTHZ = 12
NL%IDATEZ = 31
NL%IYEARZ = 2006
NL%ITIMEZ = 0000
The site I'm running is Bartlett, UTC-05:00 (EST) and UTC-04:00 (EDT). So I could understand why it produces values for 2004, and why there could be 10 values (5*2 = 10)..but what's up with 11 values? does it have something to do with fortran indices? or changing daylight saving time?
Finally, how can I resolve this issue? (I tried giving runs from 01/01/2005 0500
to 01/01/2007 0500
, it still produces 1 value at 2004, and at the end, in 2006 it skips the whole December and complains that there is no 2007JAN.h5
)
correct me if I'm wrong, but the only adjustment for timezone issues is in met2model.ED. And it just adds buffer in front of the time-series which results in losing data points at the end of the original data time-series. Not sure if that makes sense
write.configs only writes day month and year, there is no tag for manipulating ITIMEA
and ITIMEZ
, they're 0000
in the templates
model2netcdf.ED just processes whatever time-period is requested
@istfer I have also noticed for my runs that an analysis-T-2005-00-00-000000-g01.h5 files gets made, even when I'm starting from 2006. (At Duke Forest).
I checked my "extra" analysis-T-2005 file.
GROUP "/" {
DATASET "BASEFLOW" {
DATATYPE H5T_IEEE_F32LE
DATASPACE SIMPLE { ( 17520, 1 ) / ( 17520, 1 ) }
DATA {
(0,0): 3.52222e-08,
(1,0): 3.5223e-08,
(2,0): 3.52239e-08,
(3,0): 3.52248e-08,
(4,0): 3.52257e-08,
(5,0): 3.52265e-08,
(6,0): 3.52274e-08,
(7,0): 3.52283e-08,
(8,0): 3.52292e-08,
(9,0): 3.523e-08,
(10,0): 3.52309e-08,
(11,0): 0,
(12,0): 0,
(13,0): 0,
(14,0): 0,
(15,0): 0,
I found similar numbers to you Istem, but one off. (Ie, I have 10 non-zero values, and 17519 total)
thanks @mccabete
hmm where to address this then
I feel like it would be the easiest to handle it in the post processing, rather than adding buffer to the met driver and shift everything awkwardly for timezone match
we can let ED do whatever it's doing, in the model2netcdf we can look for different analysis-T-YYYY
files if the site is not in UTC zone, read from them and concatenate?
different flavors of what happens to the model-data comparison in the 2006 part of one of my 2005/06/01 - 2006/12/31 runs (red : data, blue : model output read by pecan workflow, black : model output read and aligned manually)
to make things more complicated this is only visible in the second year if you're starting the run from the middle of the year as in my 2005/06/01 - 2006/12/31 example. If you're starting your runs from Jan 1st, mismatch appears in the first year as well
I could be wrong, but I think @istfer's issue is a more simple fix/issue: ED runs stop as soon as the end date is reached -- i.e. end date is not included in the runs. For complete ED runs, I'm pretty sure you'll need to do actual end date + 1. So you'll need your end date as 2007/01/01
It won't matter what met goes to 2007/01/01 since it never gets run. (Default will be to recycle the met)
I tried giving different start/end dates, it complained that there is no 2007JAN.h5 but I guess this is solvable by changing a flag in ED2IN (so that it recycles the met, but doesn't run 2007/01/01)
However, the real issue is that it shifts the outputs according to timezone (e.g. writes the first few data points of a 2005 run to a 2004.h5 file which pecan doesn't process, and shifts the rest accordingly such that last data points in 2006 are just zeros -in addition to zeros due to not running the last day issue-). Still, it would be nice if it can be handled on the write.configs end by shifting start/end times accordingly
back to this. @mdietze what do you think?
So, there was an issue due to the hardcoded -21600.0d0
s in ED2's timestamps (also see issue #231)
once we get rid of that, there remains some other issues to be resolved. My example here is a 2005-2006 Bartlett (lst-5) run, tower files.
- simulation of 2005 starts from
2005 1 1 3000
ends at
2005 12 31 233000
As a result 2005 has 17519 non-zero values and a padding zero at the end. This could be resolved by calling h5_output for OPTI
in the beginning of the loop in the ed_model, like it's being done for INST. This way 2005 simulation will start from 2005 1 1 0
and zero will be at the beginning of 2005 output instead of the end.
- simulation of 2006 starts from
2006 1 1 0
ends at
2006 12 31 0
ED2 stops as soon as it sees the end time given in its ED2IN which is in this case:
NL%IMONTHZ = 12
NL%IDATEZ = 31
NL%IYEARZ = 2006
NL%ITIMEZ = 0
as written by pecan workflow. As a result 2006 has 17473 non-zero values with 47 padding zeros. This is not a big deal, but the runtime can be extended if we change NL%ITIMEZ
to 2330
or something like that
- the real issue is that, now we shift everything in met-process for
lst-5
and add a buffer in front (also missing data points at the end), but we don't do any correction in themodel2netcdf.ED2
which results in the following comparison with the data:
Once I manually add the padding zero at the end of the 2005 to its beginning and correct for the timezone, I get the right match:
So, model2netcdf.ED2 needs to correct for lst
as well. This also means, the way we process it now needs to change, i.e. we can't just loop over the years one by one, some of 2006-T- values belong to 2005-T- (they need to appear in 2005.nc
not in 2006.nc
from pecan's perspective)
The other alternative is to change ED2 code to assume UTC0 everywhere (such that no shift is needed in pre/post process), but @mdietze found this to be more of a hack rather than a fix
after today's meeting, adjusting for timezone in model2netcdf.ED2 might not be the way to go..rather, load_data will bring everything to UTC? just tagging @bcow here
I think the met2model.ED2 code needs a fix, but checking here first if it makes sense. @istfer @crollinson
Backstory: New visiting student Xiaowu is trying to run her site with ED2. She has created a valid NC file. However, met2model.ED2 fails on a bad “toff” value.
I looked at the files, which look good. However, I think with a positive LST offset (8, for China), then the “toff” variable in met2model.ED2 becomes negative, which breaks the code for padding time in front of the data. This would only work if LST is negative.
met2model.ED2 has these lines (from line 145)
toff <- -as.numeric(lst) * 3600 / dt
## buffer to get to GMT
slen <- seq_along(SW)
Tair <- c(rep(Tair[1], toff), Tair)[slen]
A negative toff is not cool!
I think that's a pretty old bit of code, looks like noone has used ED through pecan in the Eastern Hemisphere?
My understanding was that in pecan we decided to have things in UTC before met2model and after model2netcdf last time I checked.
But I don't think it's in effect currently (i.e. data are not in UTC), and afaik most of the models do assume local time. So we still need this conversion in met2model.ED2 (although I still find it quirky that ED runs things in UTC time).
Overall, a negative toff looks like a bug to me, I guess we need some fix like:
if(toff>0){
Tair <- c(rep(Tair[1], toff), Tair)[slen]
}else{
# Tair <- Tair[(-1:toff)]
# or if we want the full length
Tair <- c(Tair[(-1:toff)], rep(Tair[length(Tair)], -toff))[slen]
}
This issue is stale because it has been open 365 days with no activity.