pandas BUG: Unclear FutureWarning regarding inplace iloc setitem

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np, pandas as pd
values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values, columns=["a", "b"])
new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new

Issue Description

FutureWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals)

This is confusing because I did not do df.iloc, I did df.loc. In the release notes, the subsection header mentions .loc, but the text only talks about .iloc.

Additionally, it was very difficult to put together a reproducible example, until I found a related issue demonstrating that it matters whether the old/new series have different dtypes. This is reasonably clear from the release notes themselves, but not the warning message.

Expected Behavior

I assume that this change does affect both .loc and .iloc so the warning message could be updated to be more clear, but in the event it's a false alarm on .loc, it would be good to suppress it.

The warning message could also be a little bit more clear about why the warning got triggered (even if in a general sense).

Installed Versions

INSTALLED VERSIONS

commit : 87cfe4e38bafe7300a6003a1d18bd80f3f77c763 python : 3.10.0.final.0 python-bits : 64 OS : Darwin OS-release : 21.6.0 Version : Darwin Kernel Version 21.6.0: Mon Aug 22 20:20:07 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T8110 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.5.0 numpy : 1.23.3 pytz : 2022.2.1 dateutil : 2.8.2 setuptools : 63.4.1 pip : 22.1.2 Cython : None pytest : 7.1.3 hypothesis : None sphinx : 5.1.1 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.5.0 pandas_datareader: None bs4 : 4.11.1 bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.6.0 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.9.1 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None

Sep 21 '22 01:09 mwaskom

Actually even looking at the release notes, I don't think I understand exactly what is deprecated here. The relevant section starts with

Most of the time setting values with DataFrame.iloc() attempts to set values inplace, only falling back to inserting a new array if necessary.

But then it says

This behavior is deprecated. In a future version, setting an entire column with iloc will attempt to operate inplace.

Isn't this what already happens ("most of the time"?) And if it's about different dtypes, how will that work? Do you mean to say it will coerce the dtype when setting the data?

Sep 21 '22 01:09 mwaskom

Seems like this warning was added in https://github.com/pandas-dev/pandas/pull/45333 along with a some TODOs to get more info. Maybe @jbrockmendel has some guidance on what to do here.

Sep 21 '22 14:09 anthonyaag

I'm seeing this same behaviour in another package (bug report linked above). Can confirm it's also when a change in dtype is happening, and with loc instead of iloc.

Oct 04 '22 21:10 dstansby

Could change "iloc" to "loc/iloc"?

Oct 05 '22 14:10 jbrockmendel

I'm not sure that's a sufficient change @jbrockmendel because df.isetitem will not work in that case and there is no corresponding df.setitem. I also suspect it's possible to get into a situation where if you do df[col] = data you get a SettingWithCopyWarning telling you to do df.loc[:, col] = data, but then when you do df.loc[:, col], you'll get a warning telling you to do df[col] = data. I think people will find that confusing.

Oct 18 '22 10:10 mwaskom

What's the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.

I have a script that runs df.loc[i:ii, c] = newvals in a loop over a number of dataframes and only some of the cases emits this warning, but in all cases values are set correctly.

I also don't see how either df[df.columns[i]] = newvals or, df.isetitem(i, newvals) can be a substitute for either df.loc[i:ii, c] or df.iloc[i:ii, c]

Oct 19 '22 08:10 olsgaard

What's the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.

Definitely open to suggestions for better wording. Let me try to explain using the OP example:

import numpy as np
import pandas as pd

values = np.arange(4).reshape(2, 2)

df = pd.DataFrame(values, columns=["a", "b"])

At this point the DataFrame df is directly backed by the original values, so doing something like values[0, 0] = 99 would affect the values in df.

There are types of setting (e.g. df.iloc[0,0] = 11) that will be "inplace" and will edit the original values, and others (e.g. df["B"] = 42) that will create a new array and NOT edit the original values.

Unfortunately, because of [reasons], the existing behavior is not super-consistent in when we do inplace vs not-inplace. That's why this particular case is deprecated, so in 2.0 we can make the behavior more consistent.

In particular, in your case:

new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new  # <- issues the warning about future behavior changing

In the current behavior, df.loc[:, "a"] = new is NOT inplace, but in the future it will be. The warning here is just in case you really care about inplace-vs-not, to keep the old behavior you need to do df["a"] = new instead.

Is that helpful? If you have suggestions to make the warning or docs clearer, please let us know.

Oct 29 '22 18:10 jbrockmendel

Is there a way to disable this warning? For example, I have gotten the warning, and I acknowledge that this behavior will change and am OK with it, can I disable the warning?

Why? I have warnings turned into errors in my test suite so that I can catch potential pandas-misuse, but this new warning causes my tests to fail with (as far as I can tell) no way to disable the warning.

Nov 01 '22 01:11 SethMMorton

@pytest.mark.filterwarnings?

Nov 01 '22 15:11 jbrockmendel

@jbrockmendel Yes, that is what I ended up doing, but that's not what I was hoping. To the best of my knowledge, other warnings from pandas are things that I can take direct action upon to correct a potentially dangerous situation, or to prepare for a future change - making the appropriate changes in these cases silences the warnings.

In this case, it seems more informative than a true warning, at least in the case where I am OK with the setting being in-place in the future (which I am). I am OK with using the warnings filter, but I was hoping for some way to suppress it at the pandas level so downstream users don't potentially see it as well.

Nov 01 '22 16:11 SethMMorton

I'm getting this warning indirectly when I call df.update(). I believe it does need to fixed in pandas, at least in that spot.

Nov 08 '22 16:11 alecglen

this issue is quite annoying...

Nov 08 '22 21:11 dss010101

I'm getting this warning indirectly when I call df.update(). I believe it does need to fixed in pandas, at least in that spot.

Me too, df.update() does:

self.loc[:, col] = expressions.where(mask, this, that)

The warning message essentially says that DataFrame.update currently does not update but will in the future versions, which doesn't sound right.

Nov 11 '22 21:11 max0x7ba

I'm getting this with both:

df.iloc[slice.index] = slice

and the equivalent (in my case):

df.update(slice)

Additionally, neither line's syntax matches the warning, which just adds to the confusion. It is clear from this bug report that the warning itself should not be showing regardless.

This StackOverflow entry directly references the df.update FutureWarning bug: https://stackoverflow.com/questions/74342791/pandas-dataframe-upsert-futurewarning-when-doing-dataframe-update-on-dataframe

Nov 13 '22 07:11 milosivanovic

Seeing as this warning shows up in places where users can't do anything about it, perhaps we should consider removing the warning?

I think most users don't care too much about whether the operation is truly in-place or not, so just making the breaking change in 2.0.0 seems unlikely to do much harm, especially if:

it's clearly documented
the current behaviour is inconsistent anyway

Furthermore, I don't think we should encourage filterwarnings, because then people risk missing warnings about breaking changes that are much more likely to cause bugs (e.g. numeric_only default)

Nov 29 '22 15:11 MarcoGorelli

Seeing as this warning shows up in places where users can't do anything about it, perhaps we should consider removing the warning? [...]

I'd second this idea. If this is only a breaking change in the next major release, in my opinion, this would be an acceptable and preferred approach for me. I maintain a project that heavily depends on Pandas, and for some internal operations I get a few dozen warnings because of this. I am basically forced to silence them in some way or another, which is rather annoying. I suppose that there is no perfect solution, but I'd prefer to not have this warning given what the advantages and disadvantages are.

Nov 29 '22 17:11 theOehrly

perhaps we should consider removing the warning?

Works for me.

Nov 29 '22 17:11 jbrockmendel

awesome - if anyone here wants to make a PR, that'd be welcome https://pandas.pydata.org/docs/development/contributing.html

Nov 29 '22 18:11 MarcoGorelli

take

Dec 03 '22 07:12 aneesh98

Should the fix also include a test in tests subdirectory to confirm that a warning is not raised?

Dec 03 '22 14:12 aneesh98

yeah, you could use with tm.assert_produces_warning(None) for that (we should really run the whole test suite with -W error, but we're not there yet, so for now let's explicitly assert no warning is raised)

Dec 03 '22 14:12 MarcoGorelli

Hi @MarcoGorelli , I have done some analysis as to why the warning is being raised for call for loc and here are some findings.

The dunder method __setitem__ for "loc" and "iloc" has a call to this function iloc._setitem_with_indexer(indexer, value, self.name) in (pandas/core/indexing.py).

It in turn, calls a function _setitem_single_block, in which if the values for all the rows for a column are being set (as is the case in the issue), function _setitem_single_column gets called. `

 info_axis = self.obj._info_axis_number
 item_labels = self.obj._get_axis(info_axis)
 if isinstance(indexer, tuple):

     # if we are setting on the info axis ONLY
     # set using those methods to avoid block-splitting
     # logic here
     if (
         self.ndim == len(indexer) == 2
         and is_integer(indexer[1])
         and com.is_null_slice(indexer[0])
     ):
         col = item_labels[indexer[info_axis]]
         if len(item_labels.get_indexer_for([col])) == 1:
             # e.g. test_loc_setitem_empty_append_expands_rows
             loc = item_labels.get_loc(col)
             # Go through _setitem_single_column to get
             #  FutureWarning if relevant.
             self._setitem_single_column(loc, value, indexer[0])
             return`

As per the pull request referenced in above comments #45333, the warning was added in the _setitem_single_column function warnings.warn( "In a future version, df.iloc[:, i] = newvals will attempt " "to set the values inplace instead of always setting a new " "array. To retain the old behavior, use either " "df[df.columns[i]] = newvals or, if columns are non-unique, " "df.isetitem(i, newvals)", FutureWarning, stacklevel=find_stack_level(), )
This function _setitem_single_column is being called in cases of "loc" and "iloc" both from multiple places.

Please let me know if my understanding is correct

My question is should the warning call inside this function be removed entirely? or since the warning is related to iloc, then add an if statement to raise the warning only when its an iloc call

Dec 04 '22 13:12 aneesh98

Do we still want to consider removing the warning?

If so, then ideally we should do that for 1.5.3?

Jan 10 '23 19:01 jorisvandenbossche

I'd highly welcome that. But the PR around that seems stalled. Also, as I read the discussion here and in the PR, there doesn't really seem to be a consensus about what to do with this warning. (Remove it, change it, ...)

Jan 11 '23 09:01 theOehrly

.... as I read the discussion here and in the PR, there doesn't really seem to be a consensus about what to do with this warning. (Remove it, change it, ...)

They must undo this warning because it creates highly undesirable consequences.

Next, they should book and complete an Ayahuasca therapy session because they obviously cannot resolve this issue at their current level of conciousness and understanding.

Jan 11 '23 14:01 max0x7ba

@max0x7ba please refrain from such comments (the second paragraph), that's not helpful at all

Jan 11 '23 14:01 jorisvandenbossche

In the PR, it was suggested we could also turn the FutureWarning into a DeprecationWarning (https://github.com/pandas-dev/pandas/pull/50044#discussion_r1039821875) That would at least make it less noisy for end users (i.e. if they use a method from a package that triggers this, they by default won't see see it).

So I think there are basically three options:

Keep the warning, but fix and clarify the message
Change the warning type to DeprecationWarning (+ also fix the message)
Remove the warning, and do as breaking change in 2.0

Jan 11 '23 14:01 jorisvandenbossche

1 is not really an option, since we can not correctly guess all cases where the behavior would change (I think)

I'd be in favour of a DeprecationWarning, removing is a bit weird imo, since this reflects to users that we don't want to do this change, but it's already enforced on main

Jan 11 '23 14:01 phofl

1 is not really an option, since we can not correctly guess all cases where the behavior would change (I think)

It is an option by expanding the message to make it correct for all those different cases ...

removing is a bit weird imo, since this reflects to users that we don't want to do this change, but it's already enforced on main

For users that already bumped into the warning, and then don't see it anymore at some point, they might indeed think it is solved and they don't need to do anything. But 1) we can clearly communicate in the whatsnew that this is not the reason the warning is removed (I know, not everyone reads this), and 2) for many users that still have to update to 1.5 that's not an issue.

Jan 11 '23 15:01 jorisvandenbossche

Making it a deprecation warning is fine, imo. Then, as package developer, I can supress it from my logs once I have verified it's not a problem and users won't be bothered. I think the main problem right now is that end users get spammed with this warning for code that they have no control over and that might be perfectly fine.

Jan 11 '23 15:01 theOehrly

pandas pandas copied to clipboard

BUG: Unclear FutureWarning regarding inplace iloc setitem

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

pandas
pandas copied to clipboard