firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Support ordered-set (WITHIN GROUP) aggregates.

Open sim1984 opened this issue 2 years ago • 7 comments

Need to add support for ordered-set and hypothetical-set aggregate functions, as well as implementations of the instances defined in SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(), percent_rank(), cume_dist()). It is good to added mode() though it is not in the spec. In addition, you can gracefully solve the problem of element unordering in the LIST aggregate function (in the SQL standard LISTAGG).

Ordered-Set Aggregate Functions

PERCENTILE_CONT( <expr> ) WITHIN GROUP ( ORDER BY <order_expr> )
PERCENTILE_DISC( <expr> ) WITHIN GROUP ( ORDER BY <order_expr>)
LIST(<expr>, <separator>) WITHIN GROUP(ORDER BY <order_expr>)

It is also possible to introduce the LISTAGG function, and make LIST an alias for compatibility with the standard.

Hypothetical-Set Aggregate Functions

RANK( <args> ) WITHIN GROUP ( ORDER BY <order_exprs> )
DENSE_RANK( <args> ) WITHIN GROUP ( ORDER BY <order_exprs> )
PERCENT_RANK( <args> ) WITHIN GROUP ( ORDER BY <order_exprs> )
CUME_DIST( <expr> ) WITHIN GROUP ( ORDER BY <order_exprs> )

You can read more about how this is done in PostgreSQL here. https://www.postgresql.org/docs/current/functions-aggregate.html https://paquier.xyz/postgresql-2/postgres-9-4-feature-highlight-within-group/

sim1984 avatar Jun 16 '23 08:06 sim1984

If LISTAGG is added for standard compatibility, should legacy LIST also support ordering? If yes, should it follow the standard syntax (so that LIST and LISTAGG just become synonyms) or, being non-standard, may use its own syntax for ordering, e.g. LIST(FIELD1 ORDER BY FIELD2)?

dyemanov avatar Jan 14 '24 18:01 dyemanov

I think it's enough to make it just a synonym. But if they make their own version to simplify, then do not object.

sim1984 avatar Jan 14 '24 18:01 sim1984

The standard LISTAGG also has the separator string being mandatory while our LIST implementation has it optional, with a comma implied if omitted. And the standard LISTAGG defines the separator as a literal only, while we allow arbitrary value expressions. Would it be OK to preserve these differences for the LISTAGG too?

dyemanov avatar Jan 15 '24 06:01 dyemanov

Limiting the delimiter to only literal values is not very correct in my opinion. But sorting for LISTAGG must be made mandatory. In principle, the separator can be made mandatory (there are doubts here). Does anyone have other opinions?

sim1984 avatar Jan 15 '24 06:01 sim1984

Sorting may be mandatory in LISTAGG (as per SQL standard), but it must be optional (or missing completely) in LIST due to backward compatibility.

dyemanov avatar Jan 15 '24 06:01 dyemanov

Oracle supports LISTAGG in mostly standard way (without <listagg overflow clause>), but it has optional separator. So maybe we should preserve this syntax.

dyemanov avatar Jan 15 '24 07:01 dyemanov

Agree. In this case, we have full compatibility with the current LIST function, with the exception of the mandatory sorting. For LIST, sorting can be left optional for backward compatibility reasons. Please note in the documentation that in new projects it is recommended to use the standard LISTAGG.

sim1984 avatar Jan 15 '24 07:01 sim1984