prql icon indicating copy to clipboard operation
prql copied to clipboard

OVER clause should not include frame, when function does not support it

Open aljazerzen opened this issue 2 years ago • 3 comments

Some dialects (such as BigQuery) throw an error when you specify window frame in OVER clause, but the used function does not support it (see notes here).

In practice, this means that we should translate:

from employees
sort age
derive num = row_number

... into ...

SELECT
  *,
  ROW_NUMBER() OVER (ORDER BY age) AS num
FROM
  employees
ORDER BY
  age

instead of what we do now:

SELECT
  *,
  ROW_NUMBER() OVER (
    ORDER BY
      age ROWS BETWEEN UNBOUNDED PRECEDING
      AND UNBOUNDED FOLLOWING
  ) AS num
FROM
  employees
ORDER BY
  age

Because this includes knowing which functions support window frames and which not, I suggest we delay fix for this until the implementation of std lib has been moved into the backend. When we do this, RQ will contain name of the function instead of s-string which it has now, so SQL backend will be able to hardcode attributes of the function.

aljazerzen avatar Dec 09 '22 09:12 aljazerzen

Yes that sounds very reasonable.

An alternative (no view on which is better) is that we use types, and type the window functions which support frames vs those that don't.

Some overlap with the original issue in https://github.com/prql/prql/issues/1069 re different types of window functions!

max-sixty avatar Dec 09 '22 21:12 max-sixty

This can now be done.

For each of the window functions we need to know if it supports window frames or not. I believe this is not dialect dependent, but some dialects just treat it as an error and some ignore it.

  1. create the following function in sql/codegen.rs:
    fn supports_window_frame(expr: &rq::Expr) -> bool { ... }
    
    Take a look at BQ docs, the notes below to see which functions are supported. You can match the function name in ExprKind::BuildinFunction::name (contains name of the function in our stdlib).
  2. Add the call to your function here, and pass the result to translate_windowed.
  3. Don't emit the window frame in translate_windowed.

When done, this test will have to be updated work: test_window_functions_05. Also add a test to test.rs with the original example from this issue and an example using SUM instead of ROW_NUMBER (which should emit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING).

aljazerzen avatar Jan 05 '23 09:01 aljazerzen

I believe this is not dialect dependent, but some dialects just treat it as an error and some ignore it.

This was also my understanding, from https://github.com/prql/prql/issues/1069#issuecomment-1344867546. So agree with the plan to never emit a window clause for these functions, and then all dialects should behave correctly.

max-sixty avatar Jan 05 '23 17:01 max-sixty