pandas2 icon indicating copy to clipboard operation
pandas2 copied to clipboard

.loc without reindexing

Open shoyer opened this issue 8 years ago • 0 comments

Currently, there is no way to index a list of values in pandas without inserting NA for missing values. It could be nice to make this possible, either by making a variation of .loc that raises KeyError in such cases or by changing the behavior of .loc altogether.

In xarray, .loc only works with pre-existing index labels for getitem df.loc[key] and assignment df.loc[key] = values (inserting new columns is OK). Reindexing behavior can still be obtained with explicit calls to .reindex.

Conceivably, we could make things work in the same way for pandas. Two major implications of such a change:

  1. Significantly simpler indexing code. All the logic for mapping indexers to positions in xarray fits in about 50 lines of code. In contrast, pandas/core/indexing.py is some of the trickiest code in pandas, in part because it handles cases like inserting NAs and in part because it tries to handle all possible variations of indexing with minimal code duplication. I don't even envy anyone who takes on the task of translating such logic to C++.
  2. It's harder to write code that is entirely at odds with pandas's columnar data model. You can no longer do silly things like creating an empty DataFrame and filling it in later, e.g.,
df = pd.DataFrame()
for row, col, value in data:
   df.loc[row, col] = value

In my view this is a positive, but it would certainly be a big backwards compatibility break.

shoyer avatar Sep 09 '16 04:09 shoyer