ibis
ibis copied to clipboard
support for pd.MultiIndex on rows and columns
- Great piece of software!
- Is MultiIndex support planned (at least) for the pandas backend?
Below code...
import ibis
ibis.options.interactive = True
import pandas as pd
scenarios = ["sc1","sc2"]
products = ['p1', 'p2']
vars = ["qty"]
years = ["yr1","yr2","yr3"]
quarters = ["q1","q2","q3","q4"]
rows = pd.MultiIndex.from_product([scenarios, products, vars], names=['scenarios', 'products', 'vars'])
cols = pd.MultiIndex.from_product([years, quarters], names=['years', 'quarters'])
df = pd.DataFrame(index=rows, columns=cols)
connection = ibis.pandas.connect({'df': df})
dft = connection.table('df')
...yields: TypeError: Column names must be strings to use the pandas backend
P.S.: I'd love to see a "universal" multiindex api specification!
@ams1 Thanks for opening an issue!
I remember during my graduate school days being incredibly fond of column multiindexes and their ability to allow convenient and fast subsetting for commonly-subsetted keys 😃.
Unfortunately they don't really fit into ibis's data model right now and it's unlikely that they ever will.
No other relational-ish system that ibis interacts with supports anything like a user facing API for any row index (there are of course indexes, but they are behind the scenes) let alone multiple row indexes. Multiple column indexes are rather novel and unlike anything else of which I am aware that isn't already pandas-based.
That said, it's possible there might be a way to support using DataFrames with MultiIndex
es, but it's not something that we'll prioritize any time soon. If someone were to open a pull request 😉 we'd review it!
@ams1 Thanks again for opening this issue! I'm going to close it out for now given that we're unlikely to implement this anytime soon.
Keep 'em coming!