ibis icon indicating copy to clipboard operation
ibis copied to clipboard

perf: `select` Selections not fused when selections are column expressions

Open timothydijamco opened this issue 3 years ago • 1 comments

Repro

Setup code

import pandas as pd

import ibis

backend = ibis.backends.pandas.Backend()
conn = backend.connect({})

table = conn.from_dataframe(pd.DataFrame({
    'key': [1, 1, 2, 2],
    'value': [10, 30, 20, 40],
}), 't1')

Selections are strings ✔️

If the final select is passed strings (column names) for its selections, the two selection operations are fused into one node as expected:

table_selected_1 = table.select(['key', 'value'])
table_selected_2 = table_selected_1.select(['value'])
print(table_selected_2)
r0 := PandasTable: t1
  key   int64
  value int64

Selection[r0]
  selections:
    value: r0.value

Selections are column expressions ❌

If the final select is passed column expression(s) for its selections, the two selection operations are not fused (I think this is unexpected):

table_selected_1 = table.select(['key', 'value'])
table_selected_2 = table_selected_1.select([table_selected_1['value']])
print(table_selected_2)
r0 := PandasTable: t1
  key   int64
  value int64

r1 := Selection[r0]
  selections:
    key:   r0.key
    value: r0.value

Selection[r1]
  selections:
    value: r1.value

timothydijamco avatar May 27 '22 18:05 timothydijamco

I don't think this is strictly a bug, since it doesn't affect correctness.

SQL will be uglier and the pandas/dask backends will probably do some unnecessary computation, but I don't think we'll prioritize this at the moment.

cpcloud avatar May 27 '22 18:05 cpcloud

@timothydijamco Feel free to submit a PR for this! In the meantime I'm going to close this out since it's become stale.

cpcloud avatar Jan 30 '23 15:01 cpcloud