pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: DataFrame.at allows adding new rows to a DataFrame

Open rhshadrach opened this issue 2 years ago • 8 comments

df = pd.DataFrame({'a': [1, 2]})
df.at[5, 'a'] = 6
print(df)

#      a
# 0  1.0
# 1  2.0
# 5  6.0

The docstring of DataFrame.at says this should raise a KeyError. This method shouldn't allow the addition of rows because it is ill-performant and makes .at both an indexing and reshaping method.

This may be related to https://github.com/pandas-dev/pandas/issues/48224, but seems to me to be a separate issue (at least, from a user perspective).

rhshadrach avatar Aug 31 '22 01:08 rhshadrach

take

srotondo avatar Aug 31 '22 23:08 srotondo

@rhshadrach DataFrame.loc also seems to exhibit the same behavior here, allowing the addition of a row. Unless I am very much mistaken, that's probably another related bug as well, right? I think they are probably caused by the same thing, so fixing one should fix the other, but I want to be sure the behavior in .loc is in fact a bug before I start making changes.

srotondo avatar Sep 02 '22 00:09 srotondo

The docstring of DataFrame.at says this should raise a KeyError.

On the other hand, the user guide contradicts this .. (https://pandas.pydata.org/docs/user_guide/indexing.html#fast-scalar-value-getting-and-setting):

at may enlarge the object in-place as above if the indexer is missing.

In addition, it also seems we basically have been allowing this for a long time (or forever?)


Personally, I would prefer limiting at to just accessing (or setting) existing scalar values.

jorisvandenbossche avatar Sep 02 '22 14:09 jorisvandenbossche

The KeyError mention in the docstring was added in https://github.com/pandas-dev/pandas/pull/20290, but there was no question about that at the time. That PR does lead me to a similar issue, though: https://github.com/pandas-dev/pandas/issues/46722, where it seems was decided that the docstring is incorrect (it also mentions it not tested, though, not sure that is still the case)

jorisvandenbossche avatar Sep 02 '22 14:09 jorisvandenbossche

Thanks @jorisvandenbossche - from your comments I think this should be deprecated rather than considered a bugfix.

@srotondo - I do think the same should apply to .loc (i.e. setting a value that doesn't exist should raise). However, .loc is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.

rhshadrach avatar Sep 07 '22 21:09 rhshadrach

@srotondo - I do think the same should apply to .loc (i.e. setting a value that doesn't exist should raise). However, .loc is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.

@rhshadrach After reading over the code some more, I see that it is very well established that .loc should be able to add new locations in a DataFrame, so I don't think this is an issue.

I'm currently working on a PR that will enforce this behavior in .at, but I have to fix some tests that are failing due to changed behavior first.

srotondo avatar Sep 07 '22 23:09 srotondo

I do think the same should apply to .loc (i.e. setting a value that doesn't exist should raise). However, .loc is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.

Yes, we certainly have intentionally allowed .loc to "enlarge" while setting, so if we want to change this, that would require a larger discussion.

(personally, I think it would be nice to separate the concepts of setting existing values and enlarging / adding labels, but adding yet another indexer that would allow that is also not very compelling)

jorisvandenbossche avatar Sep 08 '22 08:09 jorisvandenbossche

There is no activity here for the last 2 months, let's close this issue for now and re-open if needed.

Faraz-Tab avatar Apr 27 '24 15:04 Faraz-Tab