datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support unnest for struct data type

Open duongcongtoai opened this issue 1 year ago • 2 comments

Is your feature request related to a problem or challenge?

Support unnest for struct data type, an example of such feature in Duckdb

D CREATE TABLE t1 (s STRUCT(v VARCHAR, i INTEGER));
D INSERT INTO t1 VALUES (ROW('a', 42)),(ROW('b', 43));
D select unnest(s) from t1;
┌─────────┬───────┐
│    v    │   i   │
│ varchar │ int32 │
├─────────┼───────┤
│ a       │    42 │
│ b       │    43 │
└─────────┴───────┘

Describe the solution you'd like

We already have UnnestExec that support unnest on list data type, where each item in a list value will result into a new row into the output. But unnest for struct does almost different thing, where number of output row is expected to stay the same as the input, and only the number of output columns increase. Maybe we should add a new executor for this type of operation like "StructUnnestExec"

Describe alternatives you've considered

No response

Additional context

No response

duongcongtoai avatar Apr 27 '24 11:04 duongcongtoai

I would like to work on this

duongcongtoai avatar May 01 '24 10:05 duongcongtoai

List of problems (TBD)

  1. ExprSchemable get_type() returns a single type, while the expression unnest(some_struct) returns a value table (multiple types) https://github.com/apache/datafusion/blob/8b4a8e6b157c007e7988f715cb4b693578438f8b/datafusion/expr/src/expr_schema.rs#L124
  2. Implicit unnest on struct column if there exists a subfield access expression in select exprs Example query
D create table t (s struct(f1 int));
D insert into t values(row(1));
D select s.f1 from t;
┌───────┐
│  f1   │
│ int32 │
├───────┤
│     1 │
└───────┘

duongcongtoai avatar May 09 '24 05:05 duongcongtoai

Please help me review this PR everyone https://github.com/apache/datafusion/pull/10429

duongcongtoai avatar May 21 '24 05:05 duongcongtoai