modin
modin copied to clipboard
How to use `.loc` in case of multi-indexed dataframes?
- Python 3.8.2
- Modin 0.8.2
- Pandas 1.1.4
Question
Hi,
in the below example, the version with Modin throws an error when indexing a multi-index DataFrame, while pandas itself doesn't.
def example():
df = pd.DataFrame(
[["bar", 1, "1"], ["bar", 2, "2"], ["foo", 1, "3"], ["foo", 2, "4"]],
columns=["first", "second", "data"],
)
df = df.set_index(["first", "second"])
print(df.loc[("bar"), slice(None), :])
import pandas as pd
example()
# data
# first second
# bar 1 1
# 2 2
from modin import pandas as pd
example()
# ...
# IndexingError: Too many indexers
What would be the right way of indexing using Modin?
Full error log
---------------------------------------------------------------------------
IndexingError Traceback (most recent call last)
~/path/script.py in <module>
9 example()
10 from modin import pandas as pd
---> 11 example()
~/path/script.py in example()
5 )
6 df = df.set_index(["first", "second"])
----> 7 print(df.loc[("bar"), slice(None), :])
8 import pandas as pd
9 example()
~/path/.venv/lib/python3.8/site-packages/modin/pandas/indexing.py in __getitem__(self, key)
509 if callable(key):
510 return self.__getitem__(key(self.df))
--> 511 row_loc, col_loc, ndim, self.row_scaler, self.col_scaler = _parse_tuple(key)
512 if isinstance(row_loc, slice) and row_loc == slice(None):
513 # If we're only slicing columns, handle the case with `__getitem__`
~/path/.venv/lib/python3.8/site-packages/modin/pandas/indexing.py in _parse_tuple(tup)
207 col_loc = tup[1]
208 if len(tup) > 2:
--> 209 raise IndexingError("Too many indexers")
210 else:
211 row_loc = tup
IndexingError: Too many indexers
There is a bug in current loc implementation. It gets three parameters and doesn't know how to handle these.
As a temporary workaround you could write a row tuple explicitly so that loc gets two arguments instead of three. This works df.loc[("bar", slice(None)), :].
The following tuples used in df.loc don't work in Modin either and should be included in loc tests:
df.loc[("bar", 1)]df.loc[("bar", 1) :]df.loc[("bar"), 1]df.loc[("bar"), 1, :]
This issue has been mentioned on Modin Discuss. There might be relevant details there:
https://discuss.modin.org/t/support-for-multi-index/193/1
I am still able to reproduce this on the latest master.