arrow
arrow copied to clipboard
Add support for struct type in hash_list aggregation
Describe the enhancement requested
Can support for struct type columns be added to the hash_list aggregation function. For my particular use case I'd use it to create nested structures for json type output.
import pyarrow as pa
# source data
table = pa.table(
{
"col1": [1, 1, 2, 2, 3],
"struct_col": [
{"a": 1, "b": "testa"},
{"a": 1, "b": "testb"},
{"a": 2, "b": "testc"},
{"a": 2, "b": "testd"},
{"a": 3, "b": "teste"},
],
}
)
# desired output
grouped_table = pa.table(
{
"grouped": [1, 2, 3],
"agg_struct_col": [
[{"a": 1, "b": "testa"}, {"a": 1, "b": "testb"}],
[{"a": 2, "b": "testc"}, {"a": 2, "b": "testd"}],
[{"a": 3, "b": "teste"}],
],
}
)
# using group_by ** Can this be supported?
grouped = table.group_by("col1").aggregate([("struct_col", "list")])
This is supported in polars/duckdb etc.
Component(s)
Python