prql icon indicating copy to clipboard operation
prql copied to clipboard

Some aggregation functions need ORDER BY clause

Open aljazerzen opened this issue 10 months ago • 2 comments

let json_agg = func arr -> s"json_agg({arr})"

from invoices
sort total
aggregate (json_agg f"{month}: {total}")

(intention: I want a single row that is a json array, which contains "month: total" entries, sorted by total)

I think this PRQL should produce the following SQL:

SELECT
  json_agg(month || ': ' || total ORDER BY total)
FROM invoices

The important bit is that json_agg has its column ORDERed BY total.

This is because from invoices | sort total produces a relation, ordered by total, so this ordering should be accounted by the aggregation function.

aljazerzen avatar Apr 11 '24 11:04 aljazerzen

Yes agree.

What's the class of functions that require the ORDER BY? There's WINDOW, though possibly that's a separate issue entirely.

How would we encode this? We have a field on each function that specifies "include an ORDER BY clause within the function"?

max-sixty avatar May 30 '24 16:05 max-sixty

In PRQL, columns are arrays, not sets as they are in SQL. So columns do have an order they are in, which means that by default all aggregation function should take ordering into account.

So I think we could encode that in std.sql.prql with an annotation:

@{aggregation_unordered = true}
let sum = func col -> s"..."

@{aggregation_unordered = false}
let json_agg = func col -> s"..."

aljazerzen avatar May 30 '24 18:05 aljazerzen