pandas
pandas copied to clipboard
BUG: DataFrame.at allows adding new rows to a DataFrame
df = pd.DataFrame({'a': [1, 2]})
df.at[5, 'a'] = 6
print(df)
# a
# 0 1.0
# 1 2.0
# 5 6.0
The docstring of DataFrame.at
says this should raise a KeyError. This method shouldn't allow the addition of rows because it is ill-performant and makes .at
both an indexing and reshaping method.
This may be related to https://github.com/pandas-dev/pandas/issues/48224, but seems to me to be a separate issue (at least, from a user perspective).
take
@rhshadrach DataFrame.loc
also seems to exhibit the same behavior here, allowing the addition of a row. Unless I am very much mistaken, that's probably another related bug as well, right? I think they are probably caused by the same thing, so fixing one should fix the other, but I want to be sure the behavior in .loc
is in fact a bug before I start making changes.
The docstring of
DataFrame.at
says this should raise a KeyError.
On the other hand, the user guide contradicts this .. (https://pandas.pydata.org/docs/user_guide/indexing.html#fast-scalar-value-getting-and-setting):
at
may enlarge the object in-place as above if the indexer is missing.
In addition, it also seems we basically have been allowing this for a long time (or forever?)
Personally, I would prefer limiting at
to just accessing (or setting) existing scalar values.
The KeyError mention in the docstring was added in https://github.com/pandas-dev/pandas/pull/20290, but there was no question about that at the time. That PR does lead me to a similar issue, though: https://github.com/pandas-dev/pandas/issues/46722, where it seems was decided that the docstring is incorrect (it also mentions it not tested, though, not sure that is still the case)
Thanks @jorisvandenbossche - from your comments I think this should be deprecated rather than considered a bugfix.
@srotondo - I do think the same should apply to .loc
(i.e. setting a value that doesn't exist should raise). However, .loc
is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.
@srotondo - I do think the same should apply to
.loc
(i.e. setting a value that doesn't exist should raise). However,.loc
is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.
@rhshadrach After reading over the code some more, I see that it is very well established that .loc
should be able to add new locations in a DataFrame
, so I don't think this is an issue.
I'm currently working on a PR that will enforce this behavior in .at
, but I have to fix some tests that are failing due to changed behavior first.
I do think the same should apply to
.loc
(i.e. setting a value that doesn't exist should raise). However,.loc
is designed to be much more flexible, so I feel less certain here. It'd be good to get the thoughts of others.
Yes, we certainly have intentionally allowed .loc
to "enlarge" while setting, so if we want to change this, that would require a larger discussion.
(personally, I think it would be nice to separate the concepts of setting existing values and enlarging / adding labels, but adding yet another indexer that would allow that is also not very compelling)
There is no activity here for the last 2 months, let's close this issue for now and re-open if needed.