datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Error aliasing on double unnest on List[Struct]

Open Jeadie opened this issue 1 year ago • 1 comments

Describe the bug

Using a datafusion table with a column (col) of type List[struct[]]. Calling

SELECT unnest(unnest(col)) as col_name
FROM tbl

Results in the error

Internal error: unnest on struct can only be applied at the root level of select expression.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

To Reproduce

Context

sql> describe test_tbl
+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| column_name | data_type                                                                                                                                                                                                                      | is_nullable |
+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| id          | Utf8                                                                                                                                                                                                                           | YES         |
| labels      | List(Field { name: "item", data_type: Struct([Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) | YES         |
+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+

Without Alias

sql> select unnest(unnest(labels)) from test_tbl limit 3
+-----------------------------------------+
| unnest(unnest(test_tbl.labels)).name    |
+-----------------------------------------+
| kind/bug                                |
| kind/enhancement                        |
| kind/enhancement                        |
+-----------------------------------------+

With Alias

sql> select unnest(unnest(labels)) as a from test_tbl limit 3
Status { code: Internal, message: "Internal error: unnest on struct can only be applied at the root level of select expression.\nThis was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Mon, 26 Aug 2024 00:25:51 GMT", "content-length": "0"} }, source: None }

Expected behavior

sql> select unnest(unnest(labels)) as an_alias from test_tbl limit 3
+-----------------------------------------+
| an_alias.name                           |
+-----------------------------------------+
| kind/bug                                |
| kind/enhancement                        |
| kind/enhancement                        |
+-----------------------------------------+

Additional context

For outputs, using spiceai, specifically spice sql.

Jeadie avatar Aug 26 '24 00:08 Jeadie

i think this can be similar to an operation of aliasing an unnest on struct column, and this operation does not make much sense. It's clearer to see if in the example, the struct has two field

> select unnest(struct('a','b'));
+----------------------------------------------------+----------------------------------------------------+
| unnest_placeholder(struct(Utf8("a"),Utf8("b"))).c0 | unnest_placeholder(struct(Utf8("a"),Utf8("b"))).c1 |
+----------------------------------------------------+----------------------------------------------------+
| a                                                  | b                                                  |
+----------------------------------------------------+----------------------------------------------------+

Then adding allias like

select unnest(struct('a','b')) as some_alias;

Does not make sense, I wonder should we find a new syntax to alias for individual unnested fields for struct column

duongcongtoai avatar Jan 27 '25 03:01 duongcongtoai