prql icon indicating copy to clipboard operation
prql copied to clipboard

incorrect range in window function

Open korbash opened this issue 1 year ago • 2 comments

What happened?

I see two problems. Here, I will present the first one. In windows, a range of a single number is not processed correctly. (output from playground)

PRQL input

from login_event
group track_id (
    window rows:-1..-1 (
        sort time_upload
        derive {
            last_user = min user_id
            }
    )
)

SQL output

SELECT
  *,
  MIN(user_id) OVER (
    PARTITION BY track_id
    ORDER BY
      time_upload ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_user
FROM
  login_event

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

Expected SQL output

SELECT
  *,
  MIN(user_id) OVER (
    PARTITION BY track_id
    ORDER BY
      time_upload ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
  ) AS last_user
FROM
  login_event

MVCE confirmation

  • [X] Minimal example
  • [X] New issue

Anything else?

Of course, this problem is somewhat artificial since I can use the lag function.

korbash avatar Feb 20 '24 09:02 korbash

The second problem is again with the range, but now when using the window function first (output from playground)

PRQL input

from login_event
group track_id (
    window rows:-5..0 (
        sort time_upload
        derive {
            last_user = first user_id
            }
    )
)

SQL output

SELECT
  *,
  FIRST_VALUE(user_id) OVER (
    PARTITION BY track_id
    ORDER BY
      time_upload
  ) AS last_user
FROM
  login_event

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

Expected SQL output

SELECT
  *,
  FIRST_VALUE(user_id) OVER (
    PARTITION BY track_id
    ORDER BY
      time_upload ROWS BETWEEN 5 PRECEDING AND CURRENT ROW
  ) AS last_user
FROM
  login_event

korbash avatar Feb 20 '24 10:02 korbash

Thanks for the issues @korbash

For the first, here's a smaller example:

from login_event
window rows:1..1 (
  sort time_upload
  derive {
      last_user = min user_id
  }
)

SELECT
  *,
  MIN(user_id) OVER (
    ORDER BY
      time_upload ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  ) AS last_user
FROM
  login_event
ORDER BY
  time_upload

-- Generated by PRQL compiler version:0.11.3 (https://prql-lang.org)

If the range expression changes to 1..2, this works correctly, so something is breaking when the values are the same. I had a look for a few minutes but couldn't find where exactly. It's before generating the RQ, since the RQ contains:


          window:
            frame:
              kind: Rows
              range:
                start: null
                end: null
            partition: []

max-sixty avatar Feb 21 '24 07:02 max-sixty