xESMF icon indicating copy to clipboard operation
xESMF copied to clipboard

Comparison with xarray's built-in interp()

Open JiaweiZhuang opened this issue 7 years ago • 9 comments

Recently realized that xarray has a nice new interpolation module added by @crusaderky and @fujiisoup in pydata/xarray#2079 (also cc @shoyer, @rabernat).

It does have some overlap with xESMF (fortunately not too much), so I think it would be necessary to:

  • Compare their advantages & limitations
  • Identify their proper use cases
  • Better define the development roadmap and avoid duplication of effort

interp() wraps scipy.interpolate; while xESMF wraps ESMPy's regridding functionality. The subtle difference between "interpolation" and "regridding" is that, the former often refers to traditional 1D interpolation (sometimes N-D), mostly in Cartesian space; while the latter specifically means geospatial data on Earth's sphere.

I personally think interp() is a great fit for:

  • Interpolation over 1D coordinate (e.g. vertical layers, time). I've been using Scipy for vertical interpolation, too. ESMPy does support 3D grid, but this is generally an overkill.
  • Data that are not on Earth's sphere. Say the output from any other physical models. ESMPy can actually handle Cartesian coordinates, but everyone seems to only use spherical coordinates
  • High-dimensional regular grid (>=4D? rarely seen in Earth science but can occur in other physical sciences or machine learning). Seems like interp's API tries to generalize to arbitrary dimensions, while xESMF is specific to horizontal regridding on the sphere.
  • Sampling over a trajectory via "Advanced Interpolation". ESMPy does have a similar support via LocStream¶ but I personally don't use it. Glad that xarray has native support for this feature.

For geospatial regridding tasks, xESMF has some important strengths. Many already reviewed in the docs, but more specifically:

  • Performance, especially with large data. xESMF reuses weights but scipy.interpolate does not. This simple test shows that xESMF is 16x faster than interp(), once the weights are computed (computing weights is also faster than interp). Indeed, this performance gap will be narrowed down on distributed cloud platforms, where the I/O time dominates (pangeo-data/pangeo#334).
  • Curvilinear grid (pydata/xarray#2281). This is the major reason why I wrote xESMF...
  • Conservative algorithm, to conserve the integral for density-like fields such as air density, heat flux, emission intensity... (This algorithm is used everywhere in Earth science but is never taught in numerical analysis classes. Scipy is basically "everything in a numerical analysis textbook", so unsurprisingly it has no conservative scheme.)

In short, interp is a general-purpose interpolation module; xESMF is a geospatial regridding package targeting at Earth science needs. Looks like their objectives can be distinguished. Should we consider merging some efforts? Or just perhaps let them evolve independently?

JiaweiZhuang avatar Jul 12 '18 23:07 JiaweiZhuang

I actually didn't work on interp. What I did is http://xarray-extras.readthedocs.io/en/latest/api/interpolate.html, which is based on scipy BSpline. The key feature I needed was to work with an n-dimensional y ; i use it to interpolate the time axis of a discount curve under Monte Carlo stress (sizes=(time:100, scenario: 500000))

crusaderky avatar Jul 13 '18 06:07 crusaderky

@crusaderky Got it -- just realized that the actually PR is pydata/xarray#2104

JiaweiZhuang avatar Jul 13 '18 17:07 JiaweiZhuang

In short, interp is a general-purpose interpolation module; xESMF is a geospatial regridding package targeting at Earth science needs. Looks like their objectives can be distinguished

I agree!

Should we consider merging some efforts?

In the long term, we want an external interface that allows for extending xarray with custom index/grid objects as an explicit part of our data model, e.g., for geospatial indexing. This would allow for caching some indexing/regridding computations and potentially allow even for extending interp in xarray by third-party libraries.

shoyer avatar Jul 13 '18 18:07 shoyer

I personally think interp() is a great fit for:

  • Sampling over a trajectory via "Advanced Interpolation". ESMPy does have a similar support via LocStream¶ but I personally don't use it. Glad that xarray has native support for this feature.

@JiaweiZhuang I think what you are saying here in that ESMPy allows a user to interpolate to coordinate pair locations [lon1,lat1], [lon2,lat2] instead of always interpolating to grids of the combinations of these longitudes and latitudes. If so, is this behavior accessible through xESMF? My use case is, for example, interpolating model output to specific buoy locations for comparison, where the buoy stations are not located in a gridded array. I can subsample the output from xESMF to get this behavior, but presumably it would be more efficient to only request interpolation to the desired locations.

kthyng avatar Sep 16 '20 16:09 kthyng

@kthyng see https://xesmf.readthedocs.io/en/latest/notebooks/Using_LocStream.html I used it successfully. You'll have to rename dimensions to lat, lon which is unfortunate.

dcherian avatar Sep 16 '20 17:09 dcherian

@dcherian Ah! I missed that notebook, sorry. I will try it now. At the time I read through those I didn't know what LocStream was. I am able to use the basic xESMF functionality and then use advanced indexing to get out the diagonal of the returned array of points, and since it's lazily evaluated it's probably not too bad, but more direct would be better!

kthyng avatar Sep 16 '20 17:09 kthyng

@dcherian, @JiaweiZhuang Well that was easy! Thanks, I have the behavior I was looking for!

I guess as an unsolicited comment, I would say that it's taken me awhile to find that xESMF will work for me and my use cases since I have not come from the ESMF community and don't recognize the terms used. I'll provide examples for working with ROMS output (at least in https://github.com/kthyng/xroms/https://github.com/hetland/xroms) so hopefully this will help spread the word using different terminology!

kthyng avatar Sep 16 '20 17:09 kthyng

ya it would be good to have an equivalent of https://xarray.pydata.org/en/stable/howdoi.html. @kthyng can you open an issue at the more active fork of xesmf: https://github.com/pangeo-data/xESMF/issues

dcherian avatar Sep 16 '20 17:09 dcherian

@dcherian I can work on this after I figure out some interpolation better than I currently do. It's on my to-do list!

kthyng avatar Sep 17 '20 21:09 kthyng