datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support ORDER BY in AggregateUDF

Open jayzhan211 opened this issue 1 year ago • 1 comments

Which issue does this PR close?

Closes #8984.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jayzhan211 avatar Feb 16 '24 14:02 jayzhan211

Thank you @jayzhan211 -- I plan to review this tomorrow

alamb avatar Feb 20 '24 07:02 alamb

Edit: I think we can just pass the logical ordering expr and not move the conversion downstream.

I think the problem here is that we need logical expr (ordering and schema) for accumulator. But the arguments pass into create_aggregate_expr are already physical expr. The reason why we need logical expr is that when we defining udaf, we only know logical expr but not physical expr.

https://github.com/apache/arrow-datafusion/blob/10d5f2df1c8f81a0d64a8644cfa429331f1e8ac3/datafusion/physical-plan/src/udaf.rs#L40-L45

I'm thinking about changing the function here to let us done the converting of logical to physical lately, so we can keep the logical expr for accumulator.

Other builtin accumulator don't have the problem because they are all defined in physical-expr unlike udaf.

@alamb Does the move of physical expression conversion make senses?

jayzhan211 avatar Mar 01 '24 06:03 jayzhan211

short summary for review: I think the current to discuss issue is to decide how can we have a flexible design for create_accumulator. FirstAccumulator introduce ignore_nulls, OrderSensitiveArrayAggAccumulator has reverse those are additional argument for them, the previous design does not consider the issue.

We may need

  1. context argument so we can have store any additional argument inside
  2. Introduce user-defined accumulator, so what ever additional argument we need is flexible to change

I think there are also arguments for filter by in the future.

jayzhan211 avatar Mar 12 '24 02:03 jayzhan211

@alamb I think it is ready for review

jayzhan211 avatar Mar 26 '24 00:03 jayzhan211

Thanks @jayzhan211 -- I will check it out, hopefully tomorrow

alamb avatar Mar 26 '24 02:03 alamb

I plan to change first_value to udaf-based directly, the current implementation doesn't seem correct.

jayzhan211 avatar Mar 30 '24 13:03 jayzhan211