rust-dataframe
rust-dataframe copied to clipboard
Grouping and Aggregation Expressions
In order to implement aggregations, we need to be able to group data. Like joins, the task of grouping probably belongs upstream, but we should be able to define how to group data.
The LazyFrame might need some state (whether it's grouped or not) to prevent 'normal' calculations when it's in a grouped state. I don't want to implement a GroupedLazyFrame because we rely on mutating the &mut LazyFrame to add on computations.
An aggregation should ideally take in multiple aggregations. A grouping should take in multiple columns, with columns that aren't grouped or aggregated, getting dropped.
Data is grouped in order to be aggregated, so perhaps it might be better not to create an intermediate grouped data structure, but instead take the grouping and aggregations at the same time.
Something like:
impl LazyFrame {
fn aggregate(grouping: Vec<_>, aggregates: Vec<_>) -> Self;
}
The grouping can be Vec<&str>, but aggregates should be more expressive, as either functions that implement some aggregation trait, or an enum if we support a finite list of aggregations.