Daft
Daft copied to clipboard
[FEAT] Aggregations on List Types
We should support the following aggregations on the list type name space
col('x').list.sum()
- [x] Sum
- [x] Mean
- [x] Min
- [x] Max
- [ ] Distinct (List of unique elements)
- [ ] Count_distinct (number of unique elements)
- [ ] Flatten (Flattens List[List[T]] to List[T])
@nsalerni Can you let me know if I missed anything?
@samster25 This covers a good chunk of the use case. Two others I can think of:
-
The one I can see missing from the above list would be
apply()
(i.e. being able to take some form of custom logic to a list column). It seems like that's covered by https://github.com/Eventual-Inc/Daft/issues/1976? -
I'm not sure if the above would implicitly allow us to support the following, but this would be another simplified example of a use case I'd like to support:
df = daft.from_pydict({
"strings": ["a", "b", "c", "d"],
"lists": [[1, 1, 1, 1], [1, 1, 1, 1], [2, 2, 2], [2, 2, 2]],
})
df.groupby('lists').agg([
(col("lists").alias("list_count"), 'count')
]).collect()
I'd imagine the output of this looking something like:
lists (Int64) | list_count (UInt64)
------------- | -----------------
[2, 2, 2] | 2
[1, 1, 1, 1] | 2
Today this yields:
PanicException: List(Int64) not implemented
Hi @nsalerni ! I just made a new issue to track the work on grouping by list columns: https://github.com/Eventual-Inc/Daft/issues/1983