xarray
xarray copied to clipboard
Feature request for multiple tolerance values when using nearest method and sel()
import xarray as xr
import numpy as np
import pandas as pd
# Create test data
ds = xr.Dataset()
ds.coords['lon'] = np.arange(-120,-60)
ds.coords['lat'] = np.arange(30,50)
ds.coords['time'] = pd.date_range('2018-01-01','2018-01-30')
ds['AirTemp'] = xr.DataArray(np.ones((ds.lat.size,ds.lon.size,ds.time.size)), dims=['lat','lon','time'])
target_lat = [36.83]
target_lon = [-110]
target_time = [np.datetime64('2019-06-01')]
# Nearest pulls a date too far away
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest')
# Adding tolerance for lat long, but also applied to time
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance=0.5)
# Ideally tolerance could accept a dictionary but currently fails
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance={'lat':0.5, 'lon':0.5, 'time':np.timedelta64(1,'D')})
Expected Output
A dataset with nearest values to tolerances on each dim.
Problem Description
I would like to add the ability of tolerance to accept a dictionary for multiple tolerance values for different dimensions. Before I try implementing it, I wanted to 1) check it doesn't already exist or someone isn't working on it, and 2) get suggestions for how to proceed.
Output of xr.show_versions()
xarray: 0.11.3 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: 1.5.5 zarr: 2.2.0 cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.1.2 distributed: 1.26.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: None IPython: 7.3.0 sphinx: None
We could potentially do this, and your suggested API looks sane.
But before we start, are you sure we need it? Would it suffice to index multiple times instead? I guess a motivating use-case could be point-wise indexing in a multi-dimensional dataset, e.g., to pull out lat/lon/time values matching a list of points.
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here or remove the stale
label; otherwise it will be marked as closed automatically
This is still relevant.
As part of the ongoing explicit / flexible indexes refactoring, we'll probably need a more general solution to pass any selection option to the corresponding indexes.
Is there an update on this? I have a similar issue with lat, lon, time dimensions. I'm looking into the sel documentation and do not clearly see an option to specify independent thresholds for each of the dimensions.