koalas
koalas copied to clipboard
implement IndexSlice selection for MultiIndex
When using MultiIndex in Pandas it's customary to select using tuple of slices or values, usually using pd.IndexSlice[]
syntactic sugar. This is currently not implemented for Koalas.
See https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html for more examples.
import numpy as np
import pandas as pd
import databricks.koalas as ks
df = pd.DataFrame(np.zeros((4,4)),
index=pd.MultiIndex.from_product([('a', 'b'), ('c', 'd')]),
columns=pd.MultiIndex.from_product([('A', 'B'), ('C', 'D')]),
)
kdf = ks.from_pandas(df)
df.loc[(slice(None), 'c'), :] # OK
df.loc[:, (slice(None), 'C')] # OK
kdf.loc[(slice(None), 'c'), :] # ERROR
kdf.loc[:, (slice(None), 'C')] # ERROR
Also missing:
kdf.loc(axis=0)[:, 'd']
which now raises:
TypeError: 'LocIndexer' object is not callable