Daft icon indicating copy to clipboard operation
Daft copied to clipboard

Global Expressions: improved Aggregation syntax

Open jaychia opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe.

Currently, Daft aggregation syntax is a little loose and is modelled after PyArrow.

df.agg([(col("a"), "agg-string")])

This has a few issues:

  1. Hard to document - I believe we actually don't have documentation anywhere at the moment about the full list of aggregation options available here.
  2. Hard/unintuitive to assign new names to the aggregated column: currently users have to do (col("a").alias("a_count"), "count") in order to avoid errors if grouping by column "a". Not doing so yields a pretty confusing error message: Expressions must all have unique names; saw a twice
  3. Misspelling the aggregation string also yields a confusing error: NotImplementedError: LogicalPlan construction for operation not implemented: foo. A full list of options is not presented to the user.

Proposal

# Syntax: df.agg(<aggregated_colname>=<agg_expression>)
df.agg(
    a_count=col("a").agg.count(),
    b_min=col("b").agg.min(),
)
  1. Documentation can be under the Expression.agg namespace
  2. Users assign a new name to the aggregated column using the kwarg's name
  3. No misspelling the aggregation string, since they are now method names and not arbitary strings

jaychia avatar Jun 05 '23 13:06 jaychia

cc @@skrawcz as well

jaychia avatar Jan 23 '24 21:01 jaychia

@jaychia for rolling/window support you could build an API like this:

df.agg( # if rows are weeks...
    a_3wk_rolling_mean=col("a").agg.mean().over(3),
    a_7wk_rolling_mean=col("a").agg.mean().over(7),
)

Related to aggregations -- being able to use them in an expression would be nice, so something like:

df.select(
   a=col("a"), 
   a_zero_mean=(col("a") - col("a").agg.mean())
) # or with cols, and then a select...

skrawcz avatar Jan 23 '24 22:01 skrawcz

@kevinzwang @jaychia Do we think we can close this now?

samster25 avatar Aug 29 '24 00:08 samster25

Yep, this has been resolved by #2000. Closing now!

kevinzwang avatar Aug 29 '24 01:08 kevinzwang