earthkit-data icon indicating copy to clipboard operation
earthkit-data copied to clipboard

Implement gridspec

Open sandorkertesz opened this issue 1 year ago • 2 comments

The main purpose of gridspec is to describe the grid structure of a field. Requirements:

  • dict-like object
  • should follow the MARS vocabulary whenever possible using keywords like grid and area

Examples:

{"grid": "O1280"}
{"grid": [1, 1], "area": [60,-20,40,30]}

Interpolation

The idea is that in the yet-to-be-developed earthkit.regrid module the gridspec would specify both the input and output grids and we would able to perform an interpolation purely on a numpy array. E.g.:

import earthkit.data
import earthkit.regrid

ds = earthkit.data.from_source("file", "data_2x2.grib")

# interpolate the first field onto a 5x5 degree regular_ll grid
v = ds[0].values
gs = ds[0].metadata().gridspec
v_new, gs_new = earthkit.regrid.interpolate(v, gs, grid=[5, 5])

From the results we should be able to create a new Fieldlist like this:

ds_new = earthkit.data.FieldList.from_numpy(v_new, ds[0].metadata(), gs_new)

Now here ds[0].metadata() represents the the original "2x2" field while gs_new is the gridspec of the new "5x5" field. For users of ds_new all metadata queries should give results consistent with the new grid:

ds_new[0].metadata("numberOfDataPoints") # -> 2664
ds_new[0].metadata("latitudeOfFirstGridPointInDegrees") # -> 90
ds_new[0].shape # -> (37, 72)

The simplest solution to achieve this is to internally update the metadata object with the gridspec and create a new one. This would require the cloning of a new grib handle (since metadata stores a grib handle internally) from the gridspec. So here the task is to turn the gridspec (in the example it is as simple as {"grid: [5,5]}) into the relevant set of grib metadata keys and create the new handle with them. This is already performed in MIR, which creates the new grib handle from the generated metadata keys with the following call:

 auto* result =
    codes_grib_util_set_spec(h, &info.grid, &info.packing, flags, values.data(), values.size(), &err);

The question is if we want to duplicate the MIR code in Python or want to delegate this task straight to MIR (i.e. to earthkit.regrid). Please note that codes_grib_util_set_spec is not available in the ecCodes Python interface.

Work so far

The very first implementation in branch feature/gridspec works like this:

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "tests/data/gridspec/t_75_-60_10_40_5x5.grib1")
>>> ds[0].metadata().gridspec
GridSpec({'type': 'regular_ll', 
   'grid': [0.083, 0.083], 
   'area': [41.188, 22.0, 34.521, 30.0], 
   'j_points_consecutive': 0, 
   'i_scans_negatively': 0, 
   'j_scans_positively': 1})
>>>

Generating a new GRIB handle from gridspec is implemented for regular_ll grids. An experimental "interpolation" notebook example showcasing this can be found here: https://github.com/ecmwf/earthkit-data/blob/feature/gridspec/docs/experimental/interpolate.ipynb

Excerpt from the notebook (here interpolate is a method written in scipy just to make the testing work):

# perform interpolation onto another grid using values array and gridspec
vals = ds[0].values
gs = ds[0].metadata().gridspec
vals_new, gs_new = interpolate(vals, gs, grid=[2,2], area=[70.0, -50.0, 20.0, 10.0])

# generate new meatadata from the resulting gridspec
md_new = ds[0].metadata().override(gridspec=gs_new)

# generate new fieldlist from the new values array and metadata
ds_new = FieldList.from_numpy(vals_new.flatten(), md_new)

TODO:

  • decide if type is the right name for the grid type
  • find a naming convention for keys not part of the MARS vocabulary like iScansNegatively. At the moment snake_case is used. E.g. iScansNegatively -> i_scans_negatively.
  • should we use more descriptive, format independent keys? e.g. instead of iScansNegatively we could use i_scan_direction with values of -1 or 1
  • only add keys when they have a non-default value

TESTING: Gridspec generation can easily be tested with a dictionary containing the relevant grib metadata values for a given field:

from earthkit.data.readers.grib.gridspec import make_gridspec

md = {'gridType': 'regular_ll', 'Ni': 97, 'Nj': 81,
   'iDirectionIncrementInDegrees': 0.083, 
   'jDirectionIncrementInDegrees': 0.083, 
   'latitudeOfFirstGridPointInDegrees': 34.521,
   'latitudeOfLastGridPointInDegrees': 41.188, 
   'longitudeOfFirstGridPointInDegrees': 22.0, 
   'longitudeOfLastGridPointInDegrees': 30.0, 
   'iScansNegatively': 0, 
   'jScansPositively': 1, 
   'jPointsAreConsecutive': 0}

gs = make_gridspec(md)
print(gs)

output:

GridSpec({'type': 'regular_ll', 'grid': [0.083, 0.083], 'area': [41.188, 22.0, 34.521, 30.0], '
 j_points_consecutive': 0, 'i_scans_negatively': 0, 'j_scans_positively': 1})

sandorkertesz avatar Sep 04 '23 11:09 sandorkertesz