ibis icon indicating copy to clipboard operation
ibis copied to clipboard

bug: `schema.apply_to` should recurse for nested types

Open jcrist opened this issue 3 years ago • 3 comments

Right now we enforce the output of expr.execute() matches expr.schema() through Schema.apply_to. This is nice since it ensures the result types are the same across backends.

However, right now apply_to doesn't recurse into nested types (e.g. arrays, structs, ...).

One outcome of this issue is that dependent on backend, a nested array type might be represented as a list or np.ndarray:

postgres

In [1]: import ibis

In [2]: con = ibis.connect("postgres://postgres:postgres@localhost:5432")

In [3]: sql = """
   ...: CREATE TABLE test (x REAL[][]);
   ...: INSERT INTO test VALUES (ARRAY[ARRAY[1, 2], ARRAY[3, 4]]), (ARRAY[ARRAY[4, 5]]);
   ...: """

In [4]: con.raw_sql(sql)
Out[4]: <sqlalchemy.engine.cursor.LegacyCursorResult at 0x7f050403fac0>

In [5]: df = con.table("test").execute()

In [6]: df
Out[6]: 
                          x
0  [[1.0, 2.0], [3.0, 4.0]]
1              [[4.0, 5.0]]

In [7]: df.x.iloc[0]  # it's a list of lists
Out[7]: [[1.0, 2.0], [3.0, 4.0]]

duckdb

In [8]: con = ibis.connect("duckdb://")

In [9]: con.raw_sql(sql)
Out[9]: <sqlalchemy.engine.cursor.LegacyCursorResult at 0x7f050403f610>

In [10]: df = con.table("test").execute()

In [11]: df
Out[11]: 
                          x
0  [[1.0, 2.0], [3.0, 4.0]]
1              [[4.0, 5.0]]

In [12]: df.x.iloc[0]  # it's a list of numpy arrays
Out[12]: [array([1., 2.], dtype=float32), array([3., 4.], dtype=float32)]

jcrist avatar Sep 08 '22 14:09 jcrist

I wonder if we should address this on a per-backend basis to avoid unnecessary data movement and copying.

cpcloud avatar Sep 08 '22 15:09 cpcloud

Would an additional method for the arrow-based backends be sufficient? Something like

def to_pandas(table: pa.Table, schema: Schema) -> pd.DataFrame:
    ...

which could be more efficient than doing schema.apply_to(table.to_pandas())

jcrist avatar Sep 08 '22 15:09 jcrist

SGTM! Since we're about to release 3.2, I'll mark this for 4.0

cpcloud avatar Sep 09 '22 12:09 cpcloud

This seems to have been fixed, I can no longer reproduce the difference between these two backends.

cpcloud avatar Jul 03 '23 09:07 cpcloud