datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support for Selective Aggregates, Filter clause

Open jdye64 opened this issue 3 years ago • 4 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do. PostgreSQL supports the SQL Filter Clause which is a clause that filters certain rows based on the defined row expressions before an aggregation is performed. Currently Datafusion does not provide a mechanism for parsing those clauses. See Filter Clause for more in depth details on the clauses behavior.

Describe the solution you'd like The datafusion::logical_plan::plan::Aggregate struct should include a new member Ex: pub filter_expr: Vec<Expr> which contains the filtering expressions that could be applied by the consuming engine before performing the actual aggregations that are defined in pub aggr_expr: Vec<Expr>

Describe alternatives you've considered None

Additional context Description of the syntax and functionality can be found here

jdye64 avatar Apr 12 '22 15:04 jdye64

Note that for most aggregation functions this could be done purely on logical plan level by rewriting AGGREGATE(input) FILTER (WHERE condition) to AGGREGATE(IF(condition, input, NULL)). This works because aggregations usually ignore NULL values themselves. One exception I can think of would be ARRAY_AGG which I think keeps NULL values.

jhorstmann avatar Apr 13 '22 11:04 jhorstmann

I would love to pick this and work on it

poonai avatar Sep 05 '22 08:09 poonai

There is a related PR to add support in the SQL query planner and logical plan, but does not add physical plan support: https://github.com/apache/arrow-datafusion/pull/3405

andygrove avatar Sep 09 '22 13:09 andygrove

Excited!!. I've implemented PhysicalExpr with filter support. I'll raise an PR with relevant changes after the mentioned PR get merged.

poonai avatar Sep 09 '22 13:09 poonai