xarray
xarray copied to clipboard
Inconsistent interpolation based on data typed
What happened?
Depending on the data type, interpolate gives different results
What did you expect to happen?
The example code outputs two arrays which are pasted below. The first array end with a one while the seconds array, which is based on the float32 dataset, has all NaN. I was expecting each array to have 6 NaNs and one numerical value.
[nan nan nan nan nan nan 1.] [nan nan nan nan nan nan nan]
Minimal Complete Verifiable Example
import xarray as xr
import numpy as np
import pandas as pd
time_range = pd.date_range(start='2024-02-20T12', periods=2, freq='6H')
data1 = xr.DataArray([np.nan,1], dims='time', coords={'time': time_range})
data2 = data1.astype('float32')
print(data1.resample({"time":"1H"}).interpolate("linear").values)
print(data2.resample({"time":"1H"}).interpolate("linear").values)
MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-513.11.1.el8_9.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development
xarray: 2024.1.1 pandas: 2.1.1 numpy: 1.26.2 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.9.2 cupy: None pint: 0.22 sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.2 conda: None pytest: None mypy: None IPython: None sphinx: None
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
Not sure what's causing this, but confirm I can reproduce. Any ideas?
Simpler repro (no datetime, no resample) - but I don't know why it happens, either.
import xarray as xr
data1 = xr.DataArray([np.nan,1], dims='x', coords={'x': [0, 6]})
data2 = data1.astype('float32')
target = [0, 6]
data1.interp(x=target)
data2.interp(x=target)
Ok this is a scipy problem - do you want to raise a issue in scipy?
import scipy as sp
import numpy as np
xi = np.array([0, 6])
yi = np.array([np.nan, 1])
sp.interpolate.interp1d(xi, yi, kind="linear")(xi)
sp.interpolate.interp1d(xi, yi.astype(np.float32), kind="linear")(xi)
(the xarray question here is - why do we choose to interpolate using scipy and not numpy?)
Issue now posted on scipy repo: https://github.com/scipy/scipy/issues/20152
From the scipy documentation, "We note that scipy.interpolate does not support interpolation with missing data. Two popular ways of representing missing data are using masked arrays of the numpy.ma library, and encoding missing values as not-a-number, NaN."
If scipy does not support interpolation of missing data, by extension xarray does not also?