follow upstream scipy interpolation improvements
Is your feature request related to a problem?
Scipy 1.10.0 has some great improvements to interpolation (release notes) particularly around the fancier methods like pchip.
It'd be good to see if we can simplify some of our code (or even enable using these options).
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
I would like to work on this. For clarity, the goal is to support tensor product interpolation wherever scipy interpn supports it, and remove any no-longer-necessary dimension checks (as in #9049)?
Nice!
the goal is to support tensor product interpolation wherever scipy interpn supports it, and remove any no-longer-necessary dimension checks
Yes, I believe so.
We may also want to replace interp1d with our own version that class make_interp_spline as mentioned here: https://github.com/pydata/xarray/issues/9404
The 1d interpolation situation seems to have a few additional inconsistencies, maybe due to updates to the scipy interface.
-
da.interp()cannot access any of the methods inInterp1dOptionsdue to thevectorizeable_onlyargument, as raised by #9049. However, these methods are accessible viada.interpolate_na().I suppose this could provide a workaround if someone needed these arguments. - I believe that "polynomial" interpolation (in
_get_interpolator,ScipyInterpolator) is misnamed and actually refers to spline interpolation, since this is how SciPyinterp1dwill interpret it. Indeed polynomial interpolation does not take a degree argument. However polynomial interpolation is being implemented by the BarycentricInterpolator and KroghInterpolator which are supported. - I think that the implementation of spline interpolation (e.g.
SplineInterpolator, accessible viainterpolate_na()) is deprecated and is now performing smoothing splines rather than spline interpolation, perhaps due to the deprecation of the argumentnu. - It seems better to me if
interpolate_na()were accomplished via a call toda.interp(), or vice versa.
I think that these point (maybe except 4) will be resolved in the course of solving the original issue, just wanted to make sure I'm on the right wavelength.
This seems right. I recommend opening as small a PR as possible for easy review rather than a large one that solves many issues. Let us know if you need help.
Does (1) seems like an easier place to start?
I am having a hard time tracing the impact of the vectorizeability of _interp_func. I wonder if there is some decorator-type behavior about detecting the vectorizeability of functions that is hidden from the stack trace?
The vectorizeable_only flag is not explicitly being passed beyond the choice of interp_class, but the choice of interp function determines the shape of the var that is passed to _interpnd.
Thanks for any insight.
What happens if you set vectorizeable_only to False in interp_func?
It fails test_interpolate_vectorize in test_interp.py. Because if vectorizeable_only is False then it will use numpy.interp to interpolate. This will eventually result in an object too deep for desired array error. The reason this is mysterious to me is that this change affects the input var to the interp_func to be two-dimensional, whereas if you leave the flag in its original position the input var is one-dimensional. I cannot find where the behavior diverges prior to the call of interp_func.
Sorry that is a bit gnarly, this module hasn't been touched in a while, so we lack some context.
Do (2) or (3) in your list above feel more approachable?
I have been unable to reproduce the strange behavior described in my previous comment so I think it's actually behaving reasonably. Thanks for the response. I have submitted a PR related to #9049 and will continue working on this.
I've moved on to the implementation of tensor product interpolators via scipy.interpolate.interpn. This has raised the following design question:
To my understanding, a few of the new interpolators (cubic and quintic tensor product splines) are "genuinely multidimensional", so an equivalent result would not be produced by applying a lower-dimensional analogue along dimensions sequentially. However missing.interp has a built in optimization to decompose interpolations in this way when possible. The optimization seems to have been introduced in response to this issue. I'm confused by the fact that overriding scipy's internal logic results in a speedup but it seems true, and there may be users relying on this optimization.
One solution would be to disable this optimization when a "genuinely multidimensional" interpolator is encountered. This would solve the issue and be backwargs-compatible. The only issue is that it would require me to figure out which interpolators are genuinely multidimensional! But the worst case scenario here is just a missed optimization and perhaps some embarrassment. My real dream would be to "pass the buck to scipy"--write things in a way that does not require any understanding of the scipy interpolators.
The only issue is that it would require me to figure out which interpolators are genuinely multidimensional! But the worst case scenario here is just a missed optimization and perhaps some embarrassment.
This should only be true for "linear" and "nearest". A test that compares our output to scipy's output should confirm this (and would be a good test to have!)