xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Inconsistent interpolation based on data typed

Open wilson0028 opened this issue 1 year ago • 6 comments

What happened?

Depending on the data type, interpolate gives different results

What did you expect to happen?

The example code outputs two arrays which are pasted below. The first array end with a one while the seconds array, which is based on the float32 dataset, has all NaN. I was expecting each array to have 6 NaNs and one numerical value.

[nan nan nan nan nan nan 1.] [nan nan nan nan nan nan nan]

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
import pandas as pd

time_range = pd.date_range(start='2024-02-20T12', periods=2, freq='6H')

data1 = xr.DataArray([np.nan,1], dims='time', coords={'time': time_range})
data2 = data1.astype('float32')

print(data1.resample({"time":"1H"}).interpolate("linear").values)
print(data2.resample({"time":"1H"}).interpolate("linear").values)

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-513.11.1.el8_9.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.3-development

xarray: 2024.1.1 pandas: 2.1.1 numpy: 1.26.2 scipy: 1.11.3 netCDF4: 1.6.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None iris: None bottleneck: 1.3.7 dask: 2023.10.0 distributed: 2023.10.0 matplotlib: 3.8.0 cartopy: 0.22.0 seaborn: None numbagg: None fsspec: 2023.9.2 cupy: None pint: 0.22 sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 68.2.2 pip: 23.3.2 conda: None pytest: None mypy: None IPython: None sphinx: None

wilson0028 avatar Feb 21 '24 00:02 wilson0028

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

welcome[bot] avatar Feb 21 '24 00:02 welcome[bot]

Not sure what's causing this, but confirm I can reproduce. Any ideas?

max-sixty avatar Feb 26 '24 05:02 max-sixty

Simpler repro (no datetime, no resample) - but I don't know why it happens, either.

import xarray as xr
data1 = xr.DataArray([np.nan,1], dims='x', coords={'x': [0, 6]})
data2 = data1.astype('float32')
target = [0, 6]
data1.interp(x=target)
data2.interp(x=target)

mathause avatar Feb 26 '24 16:02 mathause

Ok this is a scipy problem - do you want to raise a issue in scipy?

import scipy as sp
import numpy as np
xi = np.array([0, 6])
yi = np.array([np.nan, 1])
sp.interpolate.interp1d(xi, yi, kind="linear")(xi)
sp.interpolate.interp1d(xi, yi.astype(np.float32), kind="linear")(xi)

(the xarray question here is - why do we choose to interpolate using scipy and not numpy?)

mathause avatar Feb 26 '24 16:02 mathause

Issue now posted on scipy repo: https://github.com/scipy/scipy/issues/20152

wilson0028 avatar Feb 26 '24 18:02 wilson0028

From the scipy documentation, "We note that scipy.interpolate does not support interpolation with missing data. Two popular ways of representing missing data are using masked arrays of the numpy.ma library, and encoding missing values as not-a-number, NaN."

If scipy does not support interpolation of missing data, by extension xarray does not also?

wilson0028 avatar Feb 26 '24 19:02 wilson0028