pandas
pandas copied to clipboard
BUG: Unclear FutureWarning regarding inplace iloc setitem
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np, pandas as pd
values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values, columns=["a", "b"])
new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new
Issue Description
FutureWarning: In a future version,
df.iloc[:, i] = newvals
will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use eitherdf[df.columns[i]] = newvals
or, if columns are non-unique,df.isetitem(i, newvals)
This is confusing because I did not do df.iloc
, I did df.loc
. In the release notes, the subsection header mentions .loc
, but the text only talks about .iloc
.
Additionally, it was very difficult to put together a reproducible example, until I found a related issue demonstrating that it matters whether the old/new series have different dtypes. This is reasonably clear from the release notes themselves, but not the warning message.
Expected Behavior
I assume that this change does affect both .loc
and .iloc
so the warning message could be updated to be more clear, but in the event it's a false alarm on .loc
, it would be good to suppress it.
The warning message could also be a little bit more clear about why the warning got triggered (even if in a general sense).
Installed Versions
INSTALLED VERSIONS
commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763 python : 3.10.0.final.0 python-bits : 64 OS : Darwin OS-release : 21.6.0 Version : Darwin Kernel Version 21.6.0: Mon Aug 22 20:20:07 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T8110 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.5.0 numpy : 1.23.3 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 63.4.1 pip : 22.1.2 Cython : None pytest : 7.1.3 hypothesis : None sphinx : 5.1.1 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.5.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.6.0 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.9.1 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None
Actually even looking at the release notes, I don't think I understand exactly what is deprecated here. The relevant section starts with
Most of the time setting values with
DataFrame.iloc()
attempts to set values inplace, only falling back to inserting a new array if necessary.
But then it says
This behavior is deprecated. In a future version, setting an entire column with iloc will attempt to operate inplace.
Isn't this what already happens ("most of the time"?) And if it's about different dtypes, how will that work? Do you mean to say it will coerce the dtype when setting the data?
Seems like this warning was added in https://github.com/pandas-dev/pandas/pull/45333 along with a some TODOs to get more info. Maybe @jbrockmendel has some guidance on what to do here.
I'm seeing this same behaviour in another package (bug report linked above). Can confirm it's also when a change in dtype is happening, and with loc
instead of iloc
.
Could change "iloc" to "loc/iloc"?
I'm not sure that's a sufficient change @jbrockmendel because df.isetitem
will not work in that case and there is no corresponding df.setitem.
I also suspect it's possible to get into a situation where if you do df[col] = data
you get a SettingWithCopyWarning
telling you to do df.loc[:, col] = data
, but then when you do df.loc[:, col]
, you'll get a warning telling you to do df[col] = data
. I think people will find that confusing.
What's the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.
I have a script that runs df.loc[i:ii, c] = newvals
in a loop over a number of dataframes and only some of the cases emits this warning, but in all cases values are set correctly.
I also don't see how either df[df.columns[i]] = newvals
or, df.isetitem(i, newvals)
can be a substitute for either df.loc[i:ii, c]
or df.iloc[i:ii, c]
What's the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.
Definitely open to suggestions for better wording. Let me try to explain using the OP example:
import numpy as np
import pandas as pd
values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values, columns=["a", "b"])
At this point the DataFrame df
is directly backed by the original values
, so doing something like values[0, 0] = 99
would affect the values in df
.
There are types of setting (e.g. df.iloc[0,0] = 11
) that will be "inplace" and will edit the original values
, and others (e.g. df["B"] = 42
) that will create a new array and NOT edit the original values.
Unfortunately, because of [reasons], the existing behavior is not super-consistent in when we do inplace vs not-inplace. That's why this particular case is deprecated, so in 2.0 we can make the behavior more consistent.
In particular, in your case:
new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new # <- issues the warning about future behavior changing
In the current behavior, df.loc[:, "a"] = new
is NOT inplace, but in the future it will be. The warning here is just in case you really care about inplace-vs-not, to keep the old behavior you need to do df["a"] = new
instead.
Is that helpful? If you have suggestions to make the warning or docs clearer, please let us know.
Is there a way to disable this warning? For example, I have gotten the warning, and I acknowledge that this behavior will change and am OK with it, can I disable the warning?
Why? I have warnings turned into errors in my test suite so that I can catch potential pandas-misuse, but this new warning causes my tests to fail with (as far as I can tell) no way to disable the warning.
@pytest.mark.filterwarnings
?
@jbrockmendel Yes, that is what I ended up doing, but that's not what I was hoping. To the best of my knowledge, other warnings from pandas
are things that I can take direct action upon to correct a potentially dangerous situation, or to prepare for a future change - making the appropriate changes in these cases silences the warnings.
In this case, it seems more informative than a true warning, at least in the case where I am OK with the setting being in-place in the future (which I am). I am OK with using the warnings filter, but I was hoping for some way to suppress it at the pandas
level so downstream users don't potentially see it as well.
I'm getting this warning indirectly when I call df.update()
. I believe it does need to fixed in pandas, at least in that spot.
this issue is quite annoying...
I'm getting this warning indirectly when I call
df.update()
. I believe it does need to fixed in pandas, at least in that spot.
Me too, df.update()
does:
self.loc[:, col] = expressions.where(mask, this, that)
The warning message essentially says that DataFrame.update
currently does not update but will in the future versions, which doesn't sound right.
I'm getting this with both:
df.iloc[slice.index] = slice
and the equivalent (in my case):
df.update(slice)
Additionally, neither line's syntax matches the warning, which just adds to the confusion. It is clear from this bug report that the warning itself should not be showing regardless.
This StackOverflow entry directly references the df.update
FutureWarning bug: https://stackoverflow.com/questions/74342791/pandas-dataframe-upsert-futurewarning-when-doing-dataframe-update-on-dataframe
Seeing as this warning shows up in places where users can't do anything about it, perhaps we should consider removing the warning?
I think most users don't care too much about whether the operation is truly in-place or not, so just making the breaking change in 2.0.0 seems unlikely to do much harm, especially if:
- it's clearly documented
- the current behaviour is inconsistent anyway
Furthermore, I don't think we should encourage filterwarnings
, because then people risk missing warnings about breaking changes that are much more likely to cause bugs (e.g. numeric_only
default)
Seeing as this warning shows up in places where users can't do anything about it, perhaps we should consider removing the warning? [...]
I'd second this idea. If this is only a breaking change in the next major release, in my opinion, this would be an acceptable and preferred approach for me. I maintain a project that heavily depends on Pandas, and for some internal operations I get a few dozen warnings because of this. I am basically forced to silence them in some way or another, which is rather annoying. I suppose that there is no perfect solution, but I'd prefer to not have this warning given what the advantages and disadvantages are.
perhaps we should consider removing the warning?
Works for me.
awesome - if anyone here wants to make a PR, that'd be welcome https://pandas.pydata.org/docs/development/contributing.html
take
Should the fix also include a test in tests subdirectory to confirm that a warning is not raised?
yeah, you could use with tm.assert_produces_warning(None)
for that (we should really run the whole test suite with -W error
, but we're not there yet, so for now let's explicitly assert no warning is raised)
Hi @MarcoGorelli ,
I have done some analysis as to why the warning is being raised for call for loc
and here are some findings.
-
The dunder method
__setitem__
for "loc" and "iloc" has a call to this functioniloc._setitem_with_indexer(indexer, value, self.name)
in (pandas/core/indexing.py). -
It in turn, calls a function
_setitem_single_block
, in which if the values for all the rows for a column are being set (as is the case in the issue), function_setitem_single_column
gets called. `info_axis = self.obj._info_axis_number item_labels = self.obj._get_axis(info_axis) if isinstance(indexer, tuple): # if we are setting on the info axis ONLY # set using those methods to avoid block-splitting # logic here if ( self.ndim == len(indexer) == 2 and is_integer(indexer[1]) and com.is_null_slice(indexer[0]) ): col = item_labels[indexer[info_axis]] if len(item_labels.get_indexer_for([col])) == 1: # e.g. test_loc_setitem_empty_append_expands_rows loc = item_labels.get_loc(col) # Go through _setitem_single_column to get # FutureWarning if relevant. self._setitem_single_column(loc, value, indexer[0]) return`
-
As per the pull request referenced in above comments #45333, the warning was added in the
_setitem_single_column
functionwarnings.warn( "In a future version, df.iloc[:, i] = newvals will attempt " "to set the values inplace instead of always setting a new " "array. To retain the old behavior, use either " "df[df.columns[i]] = newvals or, if columns are non-unique, " "df.isetitem(i, newvals)", FutureWarning, stacklevel=find_stack_level(), )
-
This function
_setitem_single_column
is being called in cases of "loc" and "iloc" both from multiple places.
Please let me know if my understanding is correct
My question is should the warning call inside this function be removed entirely? or since the warning is related to iloc
, then add an if statement to raise the warning only when its an iloc
call
Do we still want to consider removing the warning?
If so, then ideally we should do that for 1.5.3?
I'd highly welcome that. But the PR around that seems stalled. Also, as I read the discussion here and in the PR, there doesn't really seem to be a consensus about what to do with this warning. (Remove it, change it, ...)
.... as I read the discussion here and in the PR, there doesn't really seem to be a consensus about what to do with this warning. (Remove it, change it, ...)
They must undo this warning because it creates highly undesirable consequences.
Next, they should book and complete an Ayahuasca therapy session because they obviously cannot resolve this issue at their current level of conciousness and understanding.
@max0x7ba please refrain from such comments (the second paragraph), that's not helpful at all
In the PR, it was suggested we could also turn the FutureWarning into a DeprecationWarning (https://github.com/pandas-dev/pandas/pull/50044#discussion_r1039821875) That would at least make it less noisy for end users (i.e. if they use a method from a package that triggers this, they by default won't see see it).
So I think there are basically three options:
- Keep the warning, but fix and clarify the message
- Change the warning type to DeprecationWarning (+ also fix the message)
- Remove the warning, and do as breaking change in 2.0
1 is not really an option, since we can not correctly guess all cases where the behavior would change (I think)
I'd be in favour of a DeprecationWarning, removing is a bit weird imo, since this reflects to users that we don't want to do this change, but it's already enforced on main
1 is not really an option, since we can not correctly guess all cases where the behavior would change (I think)
It is an option by expanding the message to make it correct for all those different cases ...
removing is a bit weird imo, since this reflects to users that we don't want to do this change, but it's already enforced on main
For users that already bumped into the warning, and then don't see it anymore at some point, they might indeed think it is solved and they don't need to do anything. But 1) we can clearly communicate in the whatsnew that this is not the reason the warning is removed (I know, not everyone reads this), and 2) for many users that still have to update to 1.5 that's not an issue.
Making it a deprecation warning is fine, imo. Then, as package developer, I can supress it from my logs once I have verified it's not a problem and users won't be bothered. I think the main problem right now is that end users get spammed with this warning for code that they have no control over and that might be perfectly fine.