arrow icon indicating copy to clipboard operation
arrow copied to clipboard

Add support for struct type in hash_list aggregation

Open robhod opened this issue 1 year ago • 0 comments

Describe the enhancement requested

Can support for struct type columns be added to the hash_list aggregation function. For my particular use case I'd use it to create nested structures for json type output.

import pyarrow as pa

# source data
table = pa.table(
    {
        "col1": [1, 1, 2, 2, 3],
        "struct_col": [
            {"a": 1, "b": "testa"},
            {"a": 1, "b": "testb"},
            {"a": 2, "b": "testc"},
            {"a": 2, "b": "testd"},
            {"a": 3, "b": "teste"},
        ],
    }
)

# desired output
grouped_table = pa.table(
    {
        "grouped": [1, 2, 3],
        "agg_struct_col": [
            [{"a": 1, "b": "testa"}, {"a": 1, "b": "testb"}],
            [{"a": 2, "b": "testc"}, {"a": 2, "b": "testd"}],
            [{"a": 3, "b": "teste"}],
        ],
    }
)

# using group_by ** Can this be supported?
grouped = table.group_by("col1").aggregate([("struct_col", "list")])


This is supported in polars/duckdb etc.

Component(s)

Python

robhod avatar Oct 15 '24 08:10 robhod