datafusion
datafusion copied to clipboard
Error aliasing on double unnest on List[Struct]
Describe the bug
Using a datafusion table with a column (col) of type List[struct[]]. Calling
SELECT unnest(unnest(col)) as col_name
FROM tbl
Results in the error
Internal error: unnest on struct can only be applied at the root level of select expression.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
To Reproduce
Context
sql> describe test_tbl
+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| column_name | data_type | is_nullable |
+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
| id | Utf8 | YES |
| labels | List(Field { name: "item", data_type: Struct([Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }) | YES |
+-------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
Without Alias
sql> select unnest(unnest(labels)) from test_tbl limit 3
+-----------------------------------------+
| unnest(unnest(test_tbl.labels)).name |
+-----------------------------------------+
| kind/bug |
| kind/enhancement |
| kind/enhancement |
+-----------------------------------------+
With Alias
sql> select unnest(unnest(labels)) as a from test_tbl limit 3
Status { code: Internal, message: "Internal error: unnest on struct can only be applied at the root level of select expression.\nThis was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Mon, 26 Aug 2024 00:25:51 GMT", "content-length": "0"} }, source: None }
Expected behavior
sql> select unnest(unnest(labels)) as an_alias from test_tbl limit 3
+-----------------------------------------+
| an_alias.name |
+-----------------------------------------+
| kind/bug |
| kind/enhancement |
| kind/enhancement |
+-----------------------------------------+
Additional context
i think this can be similar to an operation of aliasing an unnest on struct column, and this operation does not make much sense. It's clearer to see if in the example, the struct has two field
> select unnest(struct('a','b'));
+----------------------------------------------------+----------------------------------------------------+
| unnest_placeholder(struct(Utf8("a"),Utf8("b"))).c0 | unnest_placeholder(struct(Utf8("a"),Utf8("b"))).c1 |
+----------------------------------------------------+----------------------------------------------------+
| a | b |
+----------------------------------------------------+----------------------------------------------------+
Then adding allias like
select unnest(struct('a','b')) as some_alias;
Does not make sense, I wonder should we find a new syntax to alias for individual unnested fields for struct column