xESMF
                                
                                
                                
                                    xESMF copied to clipboard
                            
                            
                            
                        Comparison with xarray's built-in interp()
Recently realized that xarray has a nice new interpolation module added by @crusaderky and @fujiisoup in pydata/xarray#2079 (also cc @shoyer, @rabernat).
It does have some overlap with xESMF (fortunately not too much), so I think it would be necessary to:
- Compare their advantages & limitations
 - Identify their proper use cases
 - Better define the development roadmap and avoid duplication of effort
 
interp() wraps scipy.interpolate; while xESMF wraps ESMPy's regridding functionality. The subtle difference between "interpolation" and "regridding" is that, the former often refers to traditional 1D interpolation (sometimes N-D), mostly in Cartesian space; while the latter specifically means geospatial data on Earth's sphere.
I personally think interp() is a great fit for:
- Interpolation over 1D coordinate (e.g. vertical layers, time). I've been using Scipy for vertical interpolation, too. ESMPy does support 3D grid, but this is generally an overkill.
 - Data that are not on Earth's sphere. Say the output from any other physical models. ESMPy can actually handle Cartesian coordinates, but everyone seems to only use spherical coordinates
 - High-dimensional regular grid (>=4D? rarely seen in Earth science but can occur in other physical sciences or machine learning). Seems like 
interp's API tries to generalize to arbitrary dimensions, while xESMF is specific to horizontal regridding on the sphere. - Sampling over a trajectory via "Advanced Interpolation". ESMPy does have a similar support via LocStream¶ but I personally don't use it. Glad that xarray has native support for this feature.
 
For geospatial regridding tasks, xESMF has some important strengths. Many already reviewed in the docs, but more specifically:
- Performance, especially with large data. xESMF reuses weights but 
scipy.interpolatedoes not. This simple test shows that xESMF is 16x faster thaninterp(), once the weights are computed (computing weights is also faster thaninterp). Indeed, this performance gap will be narrowed down on distributed cloud platforms, where the I/O time dominates (pangeo-data/pangeo#334). - Curvilinear grid (pydata/xarray#2281). This is the major reason why I wrote xESMF...
 - Conservative algorithm, to conserve the integral for density-like fields such as air density, heat flux, emission intensity... (This algorithm is used everywhere in Earth science but is never taught in numerical analysis classes. Scipy is basically "everything in a numerical analysis textbook", so unsurprisingly it has no conservative scheme.)
 
In short, interp is a general-purpose interpolation module; xESMF is a geospatial regridding package targeting at Earth science needs. Looks like their objectives can be distinguished. Should we consider merging some efforts? Or just perhaps let them evolve independently?
I actually didn't work on interp. What I did is http://xarray-extras.readthedocs.io/en/latest/api/interpolate.html, which is based on scipy BSpline. The key feature I needed was to work with an n-dimensional y ; i use it to interpolate the time axis of a discount curve under Monte Carlo stress (sizes=(time:100, scenario: 500000))
@crusaderky Got it -- just realized that the actually PR is pydata/xarray#2104
In short, interp is a general-purpose interpolation module; xESMF is a geospatial regridding package targeting at Earth science needs. Looks like their objectives can be distinguished
I agree!
Should we consider merging some efforts?
In the long term, we want an external interface that allows for extending xarray with custom index/grid objects as an explicit part of our data model, e.g., for geospatial indexing. This would allow for caching some indexing/regridding computations and potentially allow even for extending  interp in xarray by third-party libraries.
I personally think
interp()is a great fit for:
- Sampling over a trajectory via "Advanced Interpolation". ESMPy does have a similar support via LocStream¶ but I personally don't use it. Glad that xarray has native support for this feature.
 
@JiaweiZhuang I think what you are saying here in that ESMPy allows a user to interpolate to coordinate pair locations [lon1,lat1], [lon2,lat2] instead of always interpolating to grids of the combinations of these longitudes and latitudes. If so, is this behavior accessible through xESMF? My use case is, for example, interpolating model output to specific buoy locations for comparison, where the buoy stations are not located in a gridded array. I can subsample the output from xESMF to get this behavior, but presumably it would be more efficient to only request interpolation to the desired locations.
@kthyng  see https://xesmf.readthedocs.io/en/latest/notebooks/Using_LocStream.html I used it successfully. You'll have to rename dimensions to lat, lon which is unfortunate.
@dcherian Ah! I missed that notebook, sorry. I will try it now. At the time I read through those I didn't know what LocStream was. I am able to use the basic xESMF functionality and then use advanced indexing to get out the diagonal of the returned array of points, and since it's lazily evaluated it's probably not too bad, but more direct would be better!
@dcherian, @JiaweiZhuang Well that was easy! Thanks, I have the behavior I was looking for!
I guess as an unsolicited comment, I would say that it's taken me awhile to find that xESMF will work for me and my use cases since I have not come from the ESMF community and don't recognize the terms used. I'll provide examples for working with ROMS output (at least in https://github.com/kthyng/xroms/https://github.com/hetland/xroms) so hopefully this will help spread the word using different terminology!
ya it would be good to have an equivalent of https://xarray.pydata.org/en/stable/howdoi.html. @kthyng can you open an issue at the more active fork of xesmf: https://github.com/pangeo-data/xESMF/issues
@dcherian I can work on this after I figure out some interpolation better than I currently do. It's on my to-do list!