prql
prql copied to clipboard
window function could use local sort
What's up?
Having the following data d
id value
1 1 2
2 2 3
3 3 4
4 6 2
5 5 3
6 4 4
How can I generate following SQL
SELECT
*,
AVG(value) OVER (
ORDER BY
id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma3
FROM
d
According to documentation it seems that I have to sort whole dataset rather than an input to windowing function, therefore query that is being generated by prql compiler includes an extra unwanted ORDER BY id at the end.
library(prqlr)
d = data.frame(id=c(1,2,3,6,5,4), value=c(2,3,4,2,3,4))
sql = prql_compile("from d | sort id | window rolling:3 (derive {sma3 = average value})")
cat(sql)
#SELECT
# *,
# AVG(value) OVER (
# ORDER BY
# id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
# ) AS sma3
#FROM
# d
#ORDER BY
# id
How can I define sorting locally for each window function call locally? so I don't have to clutter my SQL query with unwanted clauses.
Hi @jangorecki,
Great to see that you are trying out PRQL and posting questions.
You can put the sort inside the window pipeline.
So for your example:
library(prqlr)
d = data.frame(id=c(1,2,3,6,5,4), value=c(2,3,4,2,3,4))
sql = prql_compile("from d | window rolling:3 (sort id | derive {sma3 = average value})")
cat(sql)
or something that you can execude in the playground:
[{id=1, value=2},
{id=2, value=3},
{id=3, value=4},
{id=6, value=2},
{id=5, value=3},
{id=4, value=4},]
window rolling:3 (
sort id
derive {sma3 = average value}
)
which produces
WITH table_0 AS (
...
)
SELECT
id,
value,
AVG(value) OVER (
ORDER BY
id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma3
FROM
table_0
ORDER BY
id
-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org)
Thanks for reply, that make sense, could be mentioned in documentation. Although doesn't seem to address my problem. Not sure if you noticed but your example still produces extra, unneeded, order by at the end. @snth
I agree we could elide the ORDER BY at the end. (i wouldn't prioritize it that highly, but it is a good point)
(Thanks for the excellent reply @snth!)
I reopened the issue because that ORDER BY at the end could potentially cause an issue.
Putting a sort after the window does correctly override the ORDER BY id. However putting the sort before the window produces SQL that I believe is incorrect so I think this is a bug.
PRQL Input
[{id=1, value=2},
{id=2, value=3},
{id=3, value=4},
{id=6, value=2},
{id=5, value=3},
{id=4, value=4},]
sort value
window rolling:3 (
sort id
derive {sma3 = average value}
)
SQL Output
WITH table_0 AS (
SELECT
1 AS id,
2 AS value
UNION
ALL
SELECT
2 AS id,
3 AS value
UNION
ALL
SELECT
3 AS id,
4 AS value
UNION
ALL
SELECT
6 AS id,
2 AS value
UNION
ALL
SELECT
5 AS id,
3 AS value
UNION
ALL
SELECT
4 AS id,
4 AS value
)
SELECT
id,
value,
AVG(value) OVER (
ORDER BY
id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma3
FROM
table_0
ORDER BY
id
-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org)
Expected SQL Output
WITH table_0 AS (
SELECT
1 AS id,
2 AS value
UNION
ALL
SELECT
2 AS id,
3 AS value
UNION
ALL
SELECT
3 AS id,
4 AS value
UNION
ALL
SELECT
6 AS id,
2 AS value
UNION
ALL
SELECT
5 AS id,
3 AS value
UNION
ALL
SELECT
4 AS id,
4 AS value
)
SELECT
id,
value,
AVG(value) OVER (
ORDER BY
id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma3
FROM
table_0
ORDER BY
value
-- Generated by PRQL compiler version:0.9.5 (https://prql-lang.org)
(for anyone comparing the queries — it's the final term that is incorrect — id vs value — agree this seems to be a bug)
Agree, it is a bug. Just to note that this request asks for generating windowing function having local sort, and at the same time not bloating query with extra global sort, which is completely redundant.
Documentation for presenting use of local sort was already resolved in #3634