xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Interpolate_na: Rework 'limit' argument documentation/implementation

Open Ockenfuss opened this issue 1 year ago • 6 comments

What is your issue?

Currently, the 'limit' argument of interpolate_na shows some counterintuitive/undocumented behaviour. Take the following example:

import xarray  as xr
import numpy as np
n=np.nan
da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=["y"])
da.interpolate_na('y', limit=1, fill_value='extrapolate')

This will produce the following result:

array([ 1., nan, nan,  4.,  5.,  6., nan, nan])

Two things are surprising, in my opinion:

  1. The interpolated value 1 at the beginning is far from any of the given values
  2. The filling is done only towards the 'right'. This asymmetric behaviour is not mentioned in the documentation.

Comparison to pandas

Similar behaviour can be created using pandas with the following arguments:

da=xr.DataArray([n, n, n, 4, 5, n ,n ,n], dims=["y"])
dap=da.to_pandas()
dap.interpolate(method='slinear', limit=1, limit_direction='forward', fill_value='extrapolate')
Output
y
0    NaN
1    NaN
2    NaN
3    4.0
4    5.0
5    6.0
6    NaN
7    NaN
dtype: float64

This is equivalent to the current xarray behaviour, except there is no 1 at the beginning.

Cause

Currently, the fill mask in xarray is implemented using a rolling window operation, where values outside the array are assumed to be valid (therefore the 1). See xarray.core.missing._get_valid_fill_mask

Possible Solutions

Boundary Issue

Concerning the 1 at the beginning: I think this should be considered a bug. It is likely not what you would expect if you specify a limit. As stated, pandas does not create it as well.

Asymmetric Filling

Concerning the asymmetric filling, I see two options:

  1. No changes to the code, but mention in the documentation that (effectively), a forward-fill is done.
  2. Make something similar to what pandas is doing. In pandas, there are two additional arguments controlling the limit behaviour: limit_direction is controlling the fill direction (left, right or both). limit_area effectively controls if we only do interpolation or allow for extrapolation as well.

What do you think?

Ockenfuss avatar Mar 23 '23 16:03 Ockenfuss

Two comments if the pandas arguments are introduced:

  • If the pandas arguments are introduced, one might raise the question, whether the max_gap argument of xarray is still necessary, since pandas does not have such an argument. However, if the user does not want any interpolation if a gap is bigger than a specified length, this is currently not possible with pandas (actually, the max_gap feature is requested by the pandas community since a long time: pandas-dev/pandas#12187 )
  • Coordinate handling: Currently, limit is operating on indices, max_gap is operating on coordinates. Is there a deeper reason for this separation? Both operations limit the interpolation 'distance'. If not, similar to use_coordinates, one could introduce something like limit_use_coordinates. If True, limit as well as max_gap refer to numeric coordinates, otherwise to a linearly increasing index ([0,1,2,...]).

Ockenfuss avatar Mar 23 '23 16:03 Ockenfuss

Do you have any thoughts on this?

I think with the following signature, the function will be backward compatible:

interpolate_na(dim, method='linear', use_coordinate=True, limit=None, limit_direction='forward', limit_area=None, limit_use_coordinate=False, max_gap=None)

Ockenfuss avatar May 20 '23 18:05 Ockenfuss

Thanks for raising this issue @Ockenfuss and thoroughly documenting the behavior.

The interpolated value 1 at the beginning is far from any of the given values

Agreed that is surprising and seems like a bug!

The filling is done only towards the 'right'. This asymmetric behaviour is not mentioned in the documentation

A pull request would be most welcome @Ockenfuss - scoped to either just improve the documentation or with your preferred API change. We can discuss the preferred syntax in the pull request.

scottyhq avatar Nov 29 '23 19:11 scottyhq

Thanks for the feedback @scottyhq! I used the time over christmas and tried to come up with an implementation, see #8577! As you suggested, I pointed out specific details to discuss in the PR.

Ockenfuss avatar Dec 30 '23 23:12 Ockenfuss

After the Xarray meeting today (with @keewis and @dcherian) we discussed the preferred option for moving forward with cleaning up the limit argument. The concensus was to introduce a new keyword argument (e.g., cutoff) with the preferred default handling of limit, and to soft deprecate the old limit argument.

shoyer avatar Mar 13 '24 16:03 shoyer

Some more results from the discussion:

  • This affects all filling methods: interpolate_na, fillna, ffill, bfill, ...
  • We want two functionalities: limit (or cutoff) to limit filling distance, max_gap (or ?) to avoid any filling
  • Both should work on coordinates or index
  • The mask is a valuable product by itself
  • We prefer a new API to avoid breaking changes

Here are my proposals:

Masking function Signature:

namask('t', limit=3, limit_coords=3h, max_gap=3, max_gap_coords=3h) #in line with pandas/current xarray
namask('t', limit=3, cutoff=3h, gap_limit=3, gap_cutoff=3h) #cutoff for coordinates, limit for index

For direction control:

namask(limit_direction=forward/backward/both, limit_area=outside/inside/None) #regardless of index or coordinate

Possible API:

Making it an object:

da.namask(...) #masking object
da.namask(...).fillna(123) #return dataarray
da.namask(...).mask #get mask only

Making it a method:

mask=da.namask(...)
da.fillna(123).where(~mask) #Invalid=True to match numpy behaviour?

How to deal with existing methods:

  • da.fillna(123) shall still work
  • If any exsiting limit, max_gap is used: Internally forward to new API, raise Deprecation Warning (but possibly support for very long/forever)

Anything else: This almost feels like introducing numpy.ma in xarray... This is beyond the scope of this Issue! But we may copy some of the behaviour from numpy, e.g. invalid values are True there.

Ockenfuss avatar Mar 13 '24 17:03 Ockenfuss