BUG interp wrong if input xp isn't sorted
MWE:
import numpy
xp = [0.0, 1.0]
fp = [0.2, 0.3]
print(numpy.interp(0.5, xp, fp)) # 0.25, correct
xp = [1.0, 0.0]
fp = [0.3, 0.2]
print(numpy.interp(0.5, xp, fp)) # 0.2, WRONG
The docstring explicitly states:
xp : 1-D sequence of floats
The x-coordinates of the data points, must be increasing if argument
`period` is not specified. Otherwise, `xp` is internally sorted after
normalizing the periodic boundaries with ``xp = xp % period``.
The description is accurate: sorting does happen if a period is specified:
np.interp(0.5, xp, fp, period=np.inf)
# 0.25
But perhaps we should make np.interp better... At least, it would not be unreasonable to expect it to work for monotonously increasing or decreasing values. Sorting should definitely be optional, as it is quite slow. And I guess even now one could add to the docstring that if sorting is wanted for non-periodic data, passing period=np.inf works.
At least, it would not be unreasonable to expect it to work for monotonously increasing or decreasing values.
I'm not a huge fan of this, as it adds runtime to interp to check that an already-sorted sequence is sorted. For similar reasons, I'd avoid using np.digitize(x, bins), and use np.searchsorted(bins, x, side='right') instead, which is the same but without the redundant check.
Essentially, either your data is:
- In a random order, which is unlikely to be meaningful to interp
- Sorted in a known increasing/decreasing order, in which case requiring a manual reverse would be valuable
- Sorted in one unknown order - in this case, the caller can still do a faster job of detecting this by just comparing
a[0]witha[-1], taking advantage of the knowledge that it is sorted.
Is it possible to trivially check in searchsorted that the array is actually sorted during the iteration?
@eric-wieser - the only reason I suggested it is that it would be a very cheap addition (to searchsorted I guess). But I agree with your point that it is equally trivial for the caller to determine it.
Essentially, either your data is: [...]
Good points. Another option would be to raise an exception is the input data is not sorted as expected. The check
numpy.all(a[1:] > a[:-1])
is pretty cheap.
I'd concur that this is a big problem. I don't recall that np.interp always worked this way. It can easily introduce bugs in code if x is monotonically decreasing instead of increasing:
In [113]: np.interp(xp, x, y)
Out[113]: -1e+34
In [114]: np.interp(-xp, -x, y)
Out[114]: 5.274509803921655
Thankfully this isn't a subtle bug... this time. Throughout this process I probably read the documentation in ipython 2-4 times via np.interp? There was nothing to suggest that monotonicity was required although in hindsight I should have been suspicous from
xp : 1-D sequence of floats
The x-coordinates of the data points, must be increasing if argument
`period` is not specified. Otherwise, `xp` is internally sorted after
normalizing the periodic boundaries with ``xp = xp % period``.
Even a simple assert that a[0] < a[-1] would be a good idea to prevent this type of behavior from happening but I believe the doc string probably needs editing for clarity.
I don't believe many people think of an interpolant as requiring a monotonic increase, e.g., https://www.mathworks.com/help/matlab/ref/interp1.html, so requiring this implicitly without any warning when there is an error (which could produce a very, very nasty bug) is dangerous to the community at large using this function.
If runtime is this important, perhaps the function should be renamed to np.incrinterp and there could be a similar np.decrinterp function. But, I suppose that https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html is anticipated to the be scientific workhorse.
Perhaps something an edit to the docstring would be good from the current
One-dimensional linear interpolation.
Returns the one-dimensional piecewise linear interpolant to a function
with given values at discrete data-points.
to
One-dimensional linear interpolation, requiring monotonically increasing data.
Returns the one-dimensional piecewise linear interpolant to a function
with given values at discrete data-points that are monotonically increasing in
the independent variable.
@eric-wieser, thanks in advance for your consideration of this request to clarify np.interp in some way to help others in the future avoid bugs and save time.
Most simply, I'd recommend moving
Does not check that the x-coordinate sequence `xp` is increasing.
from the notes to the main doc string so this is one of the first things folks see. This won't affect runtime and will get the key message across clearly.
So I ran into (nearly) this problem and it caused me headaches to find out what was causing the unexpected behaviour. It turned out I provided a monotonically increasing (but not strictly) xp to np.interp:
MWE
import numpy as np
x = np.array([0, 1, 1])
y = np.array([0, 1, 0])
print(np.interp(x, x, y))
Output
array([0., 0., 0.])
I honestly would have expected to get an error on this from numpy before I found the discussion. I had skimmed the documentation beforehand but did assume increasing not meaning strictly increasing .
Would it be possible to include the strictly in the documentation as well and a warning that this is not beeing checked for? :)
@euronion - it is not surprising the code doesn't know what to do if you try to interpolate through a bi-valued function... Still, since equal-valued is such a corner case, adding "strictly" to the documentation is not a bad idea - could you perhaps make a quick PR? The github interface should suffice for this (relevant file is numpy/lib/funcion_base.py)
@mhvk - Sure thing. I hope I did not violate to many guidelines for commits and PRs .
Regarding the topic:
I would personally prefer to have numpy check for this, even with the performance penalty.
THanks, note the PR must be to the numpy branch!
On topic: I quite strongly prefer not checking (the check is not cheap compared to the operation), but can see an argument for at least having an auto-sort option. Indeed, period=np.inf will do that for you (though it gives a warning).
THanks, note the PR must be to the numpy branch!
My mistake, sorry.
Indeed,
period=np.infwill do that for you (though it gives a warning).
Will keep this in mind, thanks for pointing it out.
import numpy as np x = np.array([0, 1, 1]) y = np.array([0, 1, 0]) print(np.interp(x, x, y))Output
array([0., 0., 0.])
Note that the result is correct under the assumption that you are trying to interpolate a right-continuous function, see also my comment at https://github.com/scipy/scipy/issues/9886#issuecomment-468809786.
It seems that there is hesitation to provide checking for this very insidious behavior because of impact on runtime. Could checking for increasing monotonicity not be enabled by an argument? I feel that the vast majority of use cases, the bugs produced by this behavior is more detrimental to the runtime hit, but if runtime is of critical importance to the use case, one could disable checking by passing in a simple check_monotonicty argument boolean.
The docs are pretty clear about this now. The first line reads:
One-dimensional linear interpolation for monotonically increasing sample points.
And the xp parameter is described as
The x-coordinates of the data points, must be increasing if argument period is not specified. Otherwise, xp is internally sorted after normalizing the periodic boundaries with
xp = xp % period.
Then there's also an explicit warning:
The x-coordinate sequence is expected to be increasing, but this is not explicitly enforced. However, if the sequence
xpis non-increasing, interpolation results are meaningless.
So this behavior is explicitly mentioned in three different places, making it pretty hard to miss.
Could checking for increasing monotonicity not be enabled by an argument? I feel that the vast majority of use cases, the bugs produced by this behavior is more detrimental to the runtime hit, but if runtime is of critical importance to the use case, one could disable checking by passing in a simple
check_monotonictyargument boolean.
If you don't know if your data is sorted, then why not just use something like np.sort(xp, stable=True)? I doubt that a check_monotonicty=True would be faster for sorted arrays, and will crash your program otherwise. So I don't see any benefit in adding it.
Yes I am aware that the docs call it out, but users are not looking at the docs every single time they use the function.
A frequent use of interp is in routine data analysis and one-off small jobs, not just as part of larger codebases. Very frequently you are getting data from various sources which may or may not be monotonic and/or may or may not be monotonically increasing. When doing data analysis, you're thinking about the transformations and analysis, not the mechanics of numpy. It is very easy to forget that interp demands a strictly monotonic increasing x when you're focused on the analysis. The fact that it gives you often reasonable, but very wrong values is very deceiving. At best this causes frustration and takes debugging time until you realize that it is an issue with the data and interp and go and rectify it, at worst you believe the data and that causes a host of downstream issues. I have hit this issues countless times when doing data analysis.
The fact that other interpolating methods that require strict increasing monotonicity do this check by default, e.g.scipy.interpolate.CubicSpline and throw a ValueError when it is not met shows that many communities and users feel this is an important feature.
Yes I am aware that the docs call it out, but users are not looking at the docs every single time they use the function.
Adding an optional keyword argument requires you to be aware of monotonicity requirement too, so that won't solve this problem.
But like I said before, if your data might not be sorted, then you should probably just sort it.
But if you don't want to that for whatever reason, then I'd suggest using an assert instead:
assert np.all(np.sort(xp, stable=True) == xp)
I would consider this to be more idiomatic than the keyword argument you suggested.
Agreed with @jorenham - also because there are more than a few cases where sorting the data just gives a different wrong answer (e.g., if xp, yp are smooth but non-monotonic - sorting then definitely won't help, you would actually have to specify the branch).
Well I would've suggested the check occur by default as it is done with other packages, but I know that would be a breaking change.
Fine, just wanted to voice my opinion and agree with the others earlier in this thread who voiced similar sentiments. I see that this is not a popular one among the maintainers of numpy.
By the way, I do think this issue has become more relevant as Scipy has moved its interp1d function to legacy status and now explicitly directs users to numpy interp. interp1d has no requirement on monotonicity whatsoever, so former users of that functionality will not be expecting such a requirement. Yes they should read the docs first before using a new function, but we all know that many will look at the call signature and use it blindly.
Furthermore their new page on 1D interpolation with an example of using np.interp for 1D linear interpolation does not mention the monotonicity requirement at all.