prql icon indicating copy to clipboard operation
prql copied to clipboard

How to use LAG?

Open haydenflinner opened this issue 3 years ago โ€ข 1 comments

Hi, first 10 minutes or so tinkering with prql after reading through some articles, the FAQ, roadmap, and first part of the book. I had never seen "LAG" before and it looks like something I wished for when writing previous SQL queries, where I had just joined with the previous rowid which is much more verbose. I couldn't figure out how to write it in PRQL in a way that translates to valid SQL.

from film
select [film_id, lag film_id]

translates to:

SELECT
  film_id,
  LAG("", film_id) OVER ()
FROM
  film

Which is not a valid use of LAG, I expected it to translate to LAG(film_id) since the offset is optional. After drafting this issue I decided to tinker some more and found that this compiles correctly:

  from film
  select [film_id, lag 1 film_id]

And now I see that I should have read more closely the spec rather than assuming that it could be used 1:1 with the SQL version ๐Ÿ˜›. But as far as I know it is never valid to have lag without a column specified, so is there some way to change the definition of PRQL's LAG so that this wont compile? And is that general / generally done on other parts of the stdlib?

Submitting this as much to document in case anyone else searches as to ask the question ๐Ÿ‘

haydenflinner avatar Aug 26 '22 02:08 haydenflinner

Thanks for the report! This is indeed strange: parameter n of function lag is not optional and should throw an error. I'll look into that. It will probably be solved by #593

aljazerzen avatar Aug 29 '22 08:08 aljazerzen

This is now fixed! Not sure we have the clearest error message, but it does raise an error:

from film
select [film_id, lag film_id]
Error: 
   โ•ญโ”€[:2:18]
   โ”‚
 2 โ”‚ select [film_id, lag film_id]
   ยท                  โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€  
   ยท                       โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€ function std.select, param `columns` expected type `column`, but found type `func infer -> column`
โ”€โ”€โ”€โ•ฏ

...while this compiles successfully:

from film
select [film_id, lag 2 film_id]
SELECT
  film_id,
  LAG(film_id, 2) OVER ()
FROM
  film

Thanks a lot for the issue @haydenflinner and appreciate your patience in waiting for the fix

max-sixty avatar Nov 28 '22 08:11 max-sixty

This needs a better error message. I've opened an issue about it #1171

aljazerzen avatar Nov 28 '22 09:11 aljazerzen