prql
prql copied to clipboard
How to use LAG?
Hi, first 10 minutes or so tinkering with prql after reading through some articles, the FAQ, roadmap, and first part of the book. I had never seen "LAG" before and it looks like something I wished for when writing previous SQL queries, where I had just joined with the previous rowid which is much more verbose. I couldn't figure out how to write it in PRQL in a way that translates to valid SQL.
from film
select [film_id, lag film_id]
translates to:
SELECT
film_id,
LAG("", film_id) OVER ()
FROM
film
Which is not a valid use of LAG, I expected it to translate to LAG(film_id) since the offset is optional. After drafting this issue I decided to tinker some more and found that this compiles correctly:
from film
select [film_id, lag 1 film_id]
And now I see that I should have read more closely the spec rather than assuming that it could be used 1:1 with the SQL version ๐. But as far as I know it is never valid to have lag without a column specified, so is there some way to change the definition of PRQL's LAG so that this wont compile? And is that general / generally done on other parts of the stdlib?
Submitting this as much to document in case anyone else searches as to ask the question ๐
Thanks for the report! This is indeed strange: parameter n of function lag is not optional and should throw an error. I'll look into that. It will probably be solved by #593
This is now fixed! Not sure we have the clearest error message, but it does raise an error:
from film
select [film_id, lag film_id]
Error:
โญโ[:2:18]
โ
2 โ select [film_id, lag film_id]
ยท โโโโโโฌโโโโโ
ยท โฐโโโโโโโ function std.select, param `columns` expected type `column`, but found type `func infer -> column`
โโโโฏ
...while this compiles successfully:
from film
select [film_id, lag 2 film_id]
SELECT
film_id,
LAG(film_id, 2) OVER ()
FROM
film
Thanks a lot for the issue @haydenflinner and appreciate your patience in waiting for the fix
This needs a better error message. I've opened an issue about it #1171