Support ORDER BY in AggregateUDF
Which issue does this PR close?
Closes #8984.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
Thank you @jayzhan211 -- I plan to review this tomorrow
Edit: I think we can just pass the logical ordering expr and not move the conversion downstream.
I think the problem here is that we need logical expr (ordering and schema) for accumulator. But the arguments pass into create_aggregate_expr are already physical expr. The reason why we need logical expr is that when we defining udaf, we only know logical expr but not physical expr.
https://github.com/apache/arrow-datafusion/blob/10d5f2df1c8f81a0d64a8644cfa429331f1e8ac3/datafusion/physical-plan/src/udaf.rs#L40-L45
I'm thinking about changing the function here to let us done the converting of logical to physical lately, so we can keep the logical expr for accumulator.
Other builtin accumulator don't have the problem because they are all defined in physical-expr unlike udaf.
@alamb Does the move of physical expression conversion make senses?
short summary for review: I think the current to discuss issue is to decide how can we have a flexible design for create_accumulator. FirstAccumulator introduce ignore_nulls, OrderSensitiveArrayAggAccumulator has reverse those are additional argument for them, the previous design does not consider the issue.
We may need
-
contextargument so we can have store any additional argument inside - Introduce user-defined accumulator, so what ever additional argument we need is flexible to change
I think there are also arguments for filter by in the future.
@alamb I think it is ready for review
Thanks @jayzhan211 -- I will check it out, hopefully tomorrow
I plan to change first_value to udaf-based directly, the current implementation doesn't seem correct.