cfgrib
cfgrib copied to clipboard
Issue with NCEP seasonal forecasts
I think this issue is not really an issue caused by cfgrib
but I think it's relevant because in this case it forces me to switch to the NetCDF version of the files on the CDS.
I downloaded an NCEP forecast monthly, start date january and lead times 2, 3 and 4. The dimension are very big, due probably to the initialization method of the NCEP, if I open the GRIB with xarray I get:
<xarray.Dataset>
Dimensions: (latitude: 180, longitude: 360, number: 28, step: 39, time: 644)
Coordinates:
* number (number) int64 0 1 2 3 4 5 6 7 8 ... 19 20 21 22 23 24 25 26 27
* time (time) datetime64[ns] 1992-12-02 ... 2015-01-01T00:18:00
* step (step) timedelta64[ns] 59 days 60 days ... 150 days 151 days
surface int64 ...
* latitude (latitude) float64 89.5 88.5 87.5 86.5 ... -87.5 -88.5 -89.5
* longitude (longitude) float64 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
valid_time (time, step) datetime64[ns] ...
Data variables:
t2m (number, time, step, latitude, longitude) float32 ...
Attributes:
GRIB_edition: 1
GRIB_centre: kwbc
GRIB_centreDescription: US National Weather Service - NCEP
GRIB_subCentre: 98
Conventions: CF-1.7
institution: US National Weather Service - NCEP
history: 2021-03-26T16:33:44 GRIB to CDM+CF via cfgrib-0....
I cannot do anything with this file, because I would need 170 GB of memory to deal with it. Instead, if I convert it to NetCDF it becomes:
xarray.Dataset>
Dimensions: (latitude: 180, longitude: 360, number: 28, time: 69)
Coordinates:
* longitude (longitude) float32 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
* latitude (latitude) float32 89.5 88.5 87.5 86.5 ... -87.5 -88.5 -89.5
* number (number) int32 0 1 2 3 4 5 6 7 8 9 ... 19 20 21 22 23 24 25 26 27
* time (time) datetime64[ns] 1993-02-01 1993-03-01 ... 2015-04-01
Data variables:
t2m (time, number, latitude, longitude) float32 ...
Attributes:
Conventions: CF-1.6
history: 2021-03-26 15:32:40 GMT by grib_to_netcdf-2.21.0: grib_to_n...
I was using the GRIB because in my workflow I am computing the mean on the 'step' dimension to have a seasonal average, but in this case I get an error due to lack of memory to perform the computation. Why the NetCDF is so compact? What's the magic behind it?
@matteodefelice monthly datasets are tricky to translate due to fact that "a month" is no a proper time interval (see the discussion here for example: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#time-coordinate)
You may try to use the poorly documented time_dims
option that was added exactly to handle CDS monthly seasonal products: https://github.com/ecmwf/cfgrib/issues/97#issuecomment-557190695
I didn't test it on NCEP though.
If you want to know more, the whole conversation in the issue is a good read.
Thanks a lot. It seems that if I use verifying_time
in time_dims
I get the same coordinate used by grib_to_netcdf
. This should solve my problem BUT...why can't I use only verifying_time
in time_dims
?
If I use backend_kwargs=dict(time_dims = ('time', 'verifying_time'))
everything works fine but if I leave only the second:
ValueError: time_dims 'verifying_time' not a subset of ['time', 'step', 'valid_time', 'verifying_time', 'forecastMonth', 'indexing_time']
The error message doesn't help a lot. What is happening here?
SOLVED: it's the annoying Python-comma...If I use:
`backend_kwargs=dict(time_dims = ('verifying_time',))
then the check in dataset.py
works :)
Do you see any potential issue in doing this?