cfgrib icon indicating copy to clipboard operation
cfgrib copied to clipboard

Issue with NCEP seasonal forecasts

Open matteodefelice opened this issue 3 years ago • 2 comments

I think this issue is not really an issue caused by cfgrib but I think it's relevant because in this case it forces me to switch to the NetCDF version of the files on the CDS.

I downloaded an NCEP forecast monthly, start date january and lead times 2, 3 and 4. The dimension are very big, due probably to the initialization method of the NCEP, if I open the GRIB with xarray I get:

<xarray.Dataset>
Dimensions:     (latitude: 180, longitude: 360, number: 28, step: 39, time: 644)
Coordinates:
  * number      (number) int64 0 1 2 3 4 5 6 7 8 ... 19 20 21 22 23 24 25 26 27
  * time        (time) datetime64[ns] 1992-12-02 ... 2015-01-01T00:18:00
  * step        (step) timedelta64[ns] 59 days 60 days ... 150 days 151 days
    surface     int64 ...
  * latitude    (latitude) float64 89.5 88.5 87.5 86.5 ... -87.5 -88.5 -89.5
  * longitude   (longitude) float64 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
    valid_time  (time, step) datetime64[ns] ...
Data variables:
    t2m         (number, time, step, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          98
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    history:                 2021-03-26T16:33:44 GRIB to CDM+CF via cfgrib-0....

I cannot do anything with this file, because I would need 170 GB of memory to deal with it. Instead, if I convert it to NetCDF it becomes:

xarray.Dataset>
Dimensions:    (latitude: 180, longitude: 360, number: 28, time: 69)
Coordinates:
  * longitude  (longitude) float32 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
  * latitude   (latitude) float32 89.5 88.5 87.5 86.5 ... -87.5 -88.5 -89.5
  * number     (number) int32 0 1 2 3 4 5 6 7 8 9 ... 19 20 21 22 23 24 25 26 27
  * time       (time) datetime64[ns] 1993-02-01 1993-03-01 ... 2015-04-01
Data variables:
    t2m        (time, number, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.6
    history:      2021-03-26 15:32:40 GMT by grib_to_netcdf-2.21.0: grib_to_n...

I was using the GRIB because in my workflow I am computing the mean on the 'step' dimension to have a seasonal average, but in this case I get an error due to lack of memory to perform the computation. Why the NetCDF is so compact? What's the magic behind it?

matteodefelice avatar Mar 26 '21 15:03 matteodefelice

@matteodefelice monthly datasets are tricky to translate due to fact that "a month" is no a proper time interval (see the discussion here for example: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#time-coordinate)

You may try to use the poorly documented time_dims option that was added exactly to handle CDS monthly seasonal products: https://github.com/ecmwf/cfgrib/issues/97#issuecomment-557190695

I didn't test it on NCEP though.

If you want to know more, the whole conversation in the issue is a good read.

alexamici avatar Mar 26 '21 20:03 alexamici

Thanks a lot. It seems that if I use verifying_time in time_dims I get the same coordinate used by grib_to_netcdf. This should solve my problem BUT...why can't I use only verifying_time in time_dims? If I use backend_kwargs=dict(time_dims = ('time', 'verifying_time')) everything works fine but if I leave only the second:

ValueError: time_dims 'verifying_time' not a subset of ['time', 'step', 'valid_time', 'verifying_time', 'forecastMonth', 'indexing_time']

The error message doesn't help a lot. What is happening here?

SOLVED: it's the annoying Python-comma...If I use:

 `backend_kwargs=dict(time_dims = ('verifying_time',))

then the check in dataset.py works :)

Do you see any potential issue in doing this?

matteodefelice avatar Mar 26 '21 20:03 matteodefelice