xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Feature request for multiple tolerance values when using nearest method and sel()

Open NicWayand opened this issue 5 years ago • 4 comments

import xarray as xr
import numpy as np
import pandas as pd

# Create test data
ds = xr.Dataset()
ds.coords['lon'] = np.arange(-120,-60)
ds.coords['lat'] = np.arange(30,50)
ds.coords['time'] = pd.date_range('2018-01-01','2018-01-30')
ds['AirTemp'] = xr.DataArray(np.ones((ds.lat.size,ds.lon.size,ds.time.size)), dims=['lat','lon','time'])

target_lat = [36.83]
target_lon = [-110]
target_time = [np.datetime64('2019-06-01')]

# Nearest pulls a date too far away
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest')

# Adding tolerance for lat long, but also applied to time
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance=0.5)

# Ideally tolerance could accept a dictionary but currently fails
ds.sel(lat=target_lat, lon=target_lon, time=target_time, method='nearest', tolerance={'lat':0.5, 'lon':0.5, 'time':np.timedelta64(1,'D')})

Expected Output

A dataset with nearest values to tolerances on each dim.

Problem Description

I would like to add the ability of tolerance to accept a dictionary for multiple tolerance values for different dimensions. Before I try implementing it, I wanted to 1) check it doesn't already exist or someone isn't working on it, and 2) get suggestions for how to proceed.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.7 | packaged by conda-forge | (default, Feb 20 2019, 02:51:38) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.184-0.1.ac.235.83.329.metal1.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2

xarray: 0.11.3 pandas: 0.24.1 numpy: 1.15.4 scipy: 1.2.1 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.9.0 Nio: 1.5.5 zarr: 2.2.0 cftime: 1.0.3.4 PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None cyordereddict: None dask: 1.1.2 distributed: 1.26.0 matplotlib: 3.0.3 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.8.0 pip: 19.0.3 conda: None pytest: None IPython: 7.3.0 sphinx: None

NicWayand avatar Aug 16 '19 19:08 NicWayand

We could potentially do this, and your suggested API looks sane.

But before we start, are you sure we need it? Would it suffice to index multiple times instead? I guess a motivating use-case could be point-wise indexing in a multi-dimensional dataset, e.g., to pull out lat/lon/time values matching a list of points.

shoyer avatar Aug 16 '19 20:08 shoyer

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

stale[bot] avatar Jul 21 '21 08:07 stale[bot]

This is still relevant.

As part of the ongoing explicit / flexible indexes refactoring, we'll probably need a more general solution to pass any selection option to the corresponding indexes.

benbovy avatar Jul 30 '21 20:07 benbovy

Is there an update on this? I have a similar issue with lat, lon, time dimensions. I'm looking into the sel documentation and do not clearly see an option to specify independent thresholds for each of the dimensions.

jenseva avatar Apr 29 '24 23:04 jenseva