modin
modin copied to clipboard
Calling df.loc with multiple arguments results in KeyError
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS Version 11.6.4
- Modin version (
modin.__version__): 0.14.0 - Python version: Python 3.8.11
- Code we can use to reproduce:
import modin.pandas as pd
import numpy as np
arrays = [
np.array(["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"]),
np.array(["one", "two", "one", "two", "one", "two", "one", "two"]),
]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
df.loc['bar', 'one']
Resulting Error:
KeyError Traceback (most recent call last)
<ipython-input-7-5557f8ed36a3> in <module>
----> 1 df.loc['bar', 'one']
~/Desktop/modin/modin/pandas/indexing.py in __getitem__(self, key)
636 return self._handle_boolean_masking(row_loc, col_loc)
637
--> 638 row_lookup, col_lookup = self._compute_lookup(row_loc, col_loc)
639 result = super(_LocIndexer, self).__getitem__(row_lookup, col_lookup, ndim)
640 if isinstance(result, Series):
~/Desktop/modin/modin/pandas/indexing.py in _compute_lookup(self, row_loc, col_loc)
843 else axis_loc
844 )
--> 845 raise KeyError(missing_labels)
846
847 if isinstance(axis_lookup, pandas.Index) and not is_range_like(axis_lookup):
KeyError: array(['one'], dtype='<U3')
Expected Output (with pandas):
0 0.395674
1 -0.426304
2 0.273483
3 -0.702982
Name: (bar, one), dtype: float64
Describe the problem
Calling df.loc with multiple arguments results in Modin believing there are missing labels and therefore raises a KeyError.
Source code / logs
@naren-ponder do you find the behavior strange? It would be more expected if it would be necessary to explicitly pass the tuple to work with the multi-index, like df.loc[(bar, one)].
If this behavior is wrong in pandas itself, maybe we should not repeat it?
@anmyachev The "expected output" section I indicated above is what happens when you run that snippet of code with pandas. So given that we want to mirror the pandas behavior, I think this is a bug that should be fixed. Perhaps I am misunderstanding your question?
@naren-ponder In general you are right. But it seemed to me that there was already a precedent when we issued a warning for users that Modin's behavior in such and such a case does not coincide with the behavior of pandas, because the behavior of pandas is erroneous. @modin-project/modin-core do you remember this case? Or am I confusing something?
The behavior of pandas in this case is not erroneous, I looked at the docs. So we definitely need to fix the case.
However, the previous question is still relevant.
I got the same error, thus upvoting this issue.
@anmyachev, if Modin behavior does not match the pandas behavior, we issue a warning like this. https://github.com/modin-project/modin/blob/f41432c1c746c6a6186c376594c6c7f7dd24cdb5/modin/core/storage_formats/pandas/query_compiler.py#L2038
@alvin-chang An easy workaround for this issue would be to separate out the calls to .loc. For instance in the case listed above you could do df.loc['bar'].loc['one']. This should unblock you while we work towards putting in a fix.
This works at version 80c7891de7b6754a08d886895a110c2512c88e89.