BUG: In `main`, using `resample().interpolate(inplace=True)` raises an exception
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
from pandas import date_range, DataFrame
DR = date_range("1999-1-1", periods=365, freq="D")
DF_ = DataFrame(np.random.standard_normal((365, 1)), index=DR)
S = DF_.iloc[:, 0]
DF = DataFrame({"col1": S, "col2": S})
DF.resample("ME").interpolate(inplace=True)
Issue Description
Code raises an exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda3\envs\pandasstubs\lib\site-packages\pandas\core\resample.py", line 958, in interpolate
result_interpolated = result_interpolated.loc[final_index]
AttributeError: 'NoneType' object has no attribute 'loc'
Expected Behavior
No exception. Works fine in pandas 2.2.2
Note: Seems to be introduced by https://github.com/pandas-dev/pandas/pull/56515 by @cbpygit , review by @MarcoGorelli and @jbrockmendel
Installed Versions
INSTALLED VERSIONS
commit : 34177d6b20896bf101590306e3f0e1e0a225c5fb python : 3.9.16.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252
pandas : 3.0.0.dev0+943.g34177d6b20 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 68.2.2 pip : 22.3.1 Cython : None pytest : 8.2.0 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.2.0 lxml.etree : 5.2.1 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.4 numba : None numexpr : 2.10.0 odfpy : None openpyxl : 3.1.2 pyarrow : 16.0.0 pyreadstat : 1.2.7 python-calamine : None pyxlsb : 1.0.10 s3fs : None scipy : 1.13.0 sqlalchemy : 2.0.30 tables : 3.9.2 tabulate : 0.9.0 xarray : 2024.3.0 xlrd : 2.0.1 zstandard : 0.22.0 tzdata : 2024.1 qtpy : None pyqt5 : None
@Dr-Irv Thank you for reporting this problem! I am able to reproduce this. The problem is purely related to the inplace=True option. The interpolation result here becomes None and internally the _update_inplace method call here does not yet possess the final result. There is still the step of index correction missing, which is only applied afterwards here.
The problem is that the context in pandas.core.generic.NDFrame.interpolate does not know about the correct final index. So far, I am unable to figure out how I can update the state of the parent data frame that led to the pandas.core.resample.Resampler.interpolate call. I tried to manipulate self._selected_obj but it does not propagate. If somebody can point me at the object that needs to be updated I can easily fix this.
I could use some input here @MarcoGorelli. As we are introducing a breaking change anyway, we could think about removing the inplace option. I think it is counter-intuitive that the second method in a chain of methods modifies the original data frame "in place". Just think about the completely valid option to split the calls like this:
df = ...
resampler = df.resample("ME") # creates a resampler instance, does not modify df
resampler.interpolate(inplace=True) # modifies df, returns None
we could think about removing the inplace option.
yup - this looks like one of the places where inplace was a lie anyway? Agree, "in for a penny, in for a pound" - if we gonna break, let's break. It's for the better anyway
@MarcoGorelli 🙌 I'll take care of it and ping you in the PR.
@MarcoGorelli I looked into this but it is extremely convoluted 😢 The inplace option is passed on in multiple steps, partially to overloaded methods or abstract methods with many implementations. There are dozens of tests that use inplace=True or even test this in particular.
Before I spend a lot of time on this, could you indicate a safe way to approach it? Shall I really drop the inplace argument in all those methods, delete all the tests against the feature and switch to not-inplace in the tests that use it?
yeah i'd be ok with that
as in, remove it from interpolate. if any intermediate function (which is also used in other places) requires inplace, just pass inplace=False
@cbpygit is this a good place to ask about dtype downcasting deprecation FutureWarning? i see you are responsible for fixing .resample()...
.resample().interpolate() now produces FutureWarning: DataFrame.interpolate with object dtype is deprecated and will raise in a future version. Call obj.infer_objects(copy=False) before interpolating instead. but .resample() doesn't have infer_objects()
there is also an interesting question outlined in here that is think is related: https://github.com/pandas-dev/pandas/issues/53631#issuecomment-1853635791
should we continue in this issue?