ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(arrays): collect and concat an array column

Open DavidSlayback opened this issue 8 months ago • 4 comments

Forgive me for asking here, but I'm struggling to find relevant examples in the docs, and I'm not quite sure how to handle this.

If I want to aggregate scalar columns into an array, it's group_by(...).aggregate(c=t.c.collect()). What is the equivalent when the column is already array-valued? I've tried collect(), flatten().collect(), unnest().collect(). I see the array type has a concat method, but I can't directly use concat() in the expression

DavidSlayback avatar Mar 14 '25 14:03 DavidSlayback

Can you give an example of your desired output? Is the goal to concat every array in the column into a single array per group?

If so, we don't have an operation for that (yet).

cpcloud avatar Mar 14 '25 14:03 cpcloud

Yeah, exactly that. So if I have a table of

id lst
0 [1,2,3]
1 [4,5,6]
1 [7,8,9]

Then grouping by id would result in

id lst
0 [1,2,3]
1 [4,5,6, 7,8,9]

Presumably would have an order_by similar to collect()

DavidSlayback avatar Mar 14 '25 14:03 DavidSlayback

Would you be interested in an external contribution? I'm thinking for my own issue I may implement a custom op for BigQuery (ARRAY_CONCAT_AGG) and maybe DuckDb for unit testing, but I could potentially find my way around the codebase to hit the rest

DavidSlayback avatar Mar 14 '25 15:03 DavidSlayback

Yes, absolutely. Starting with just those two is totally fine. We can fill in the rest as they are available in the backends/someone is interested in having support for them in Ibis.

cpcloud avatar Mar 14 '25 15:03 cpcloud