ibis
ibis copied to clipboard
perf: `select` Selections not fused when selections are column expressions
Repro
Setup code
import pandas as pd
import ibis
backend = ibis.backends.pandas.Backend()
conn = backend.connect({})
table = conn.from_dataframe(pd.DataFrame({
'key': [1, 1, 2, 2],
'value': [10, 30, 20, 40],
}), 't1')
Selections are strings ✔️
If the final select is passed strings (column names) for its selections, the two selection operations are fused into one node as expected:
table_selected_1 = table.select(['key', 'value'])
table_selected_2 = table_selected_1.select(['value'])
print(table_selected_2)
r0 := PandasTable: t1
key int64
value int64
Selection[r0]
selections:
value: r0.value
Selections are column expressions ❌
If the final select is passed column expression(s) for its selections, the two selection operations are not fused (I think this is unexpected):
table_selected_1 = table.select(['key', 'value'])
table_selected_2 = table_selected_1.select([table_selected_1['value']])
print(table_selected_2)
r0 := PandasTable: t1
key int64
value int64
r1 := Selection[r0]
selections:
key: r0.key
value: r0.value
Selection[r1]
selections:
value: r1.value
I don't think this is strictly a bug, since it doesn't affect correctness.
SQL will be uglier and the pandas/dask backends will probably do some unnecessary computation, but I don't think we'll prioritize this at the moment.
@timothydijamco Feel free to submit a PR for this! In the meantime I'm going to close this out since it's become stale.