trino icon indicating copy to clipboard operation
trino copied to clipboard

`range between` seems to not support `lag()`

Open emmansh opened this issue 1 year ago • 3 comments

Cross posted at Stack Overflow.


I want to get the previous value using lag(), over a partition defined with RANGE BETWEEN. I followed an example from the documentation:

WITH orders (custkey, orderdate, totalprice) 
    AS 
    (
    VALUES
    ('cust_1', DATE '2020-10-10', 100),
    ('cust_2', DATE '2020-10-10', 15),
    ('cust_1', DATE '2020-10-15', 200),
    ('cust_1', DATE '2020-10-16', 240),
    ('cust_2', DATE '2020-12-20', 25),
    ('cust_1', DATE '2020-12-25', 140),
    ('cust_2', DATE '2021-01-01', 5)    
)

SELECT *, 
       avg(totalprice) OVER (
                             PARTITION BY custkey
                             ORDER BY orderdate
                             RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW) AS past_month_avg,
    
       lag(totalprice) OVER ( PARTITION BY custkey
                              ORDER BY orderdate
                              RANGE BETWEEN INTERVAL '1' MONTH PRECEDING AND CURRENT ROW) AS previous_total_price_within_last_month
FROM orders

It returns:

custkey orderdate totalprice past_month_avg previous_total_price_within_last_month
cust_2 2020-10-10 15 15.0 NULL
cust_2 2020-12-20 25 25.0 15
cust_2 2021-01-01 5 15.0 25
cust_1 2020-10-10 100 100.0 NULL
cust_1 2020-10-15 200 150.0 100
cust_1 2020-10-16 240 180.0 200
cust_1 2020-12-25 140 140.0 240

The problem: while the calculation for past_month_avg results as expected, the previous_total_price_within_last_month result is not as expected.


Expected Output

Given that I defined a window that ranges at the last month, I expect that lag() will return null if the "previous" totalprice value is associated with an orderdate value that is out of the 1-month-back window.

custkey orderdate totalprice past_month_avg previous_total_price_within_last_month
cust_2 2020-10-10 15 15.0 NULL
cust_2 2020-12-20 25 25.0 NULL*
cust_2 2021-01-01 5 15.0 25
cust_1 2020-10-10 100 100.0 NULL
cust_1 2020-10-15 200 150.0 100
cust_1 2020-10-16 240 180.0 200
cust_1 2020-12-25 140 140.0 NULL*

*expected value


Does ranges between not support lag(), or is my example actually demonstrating the expected behavior?

emmansh avatar Oct 10 '24 17:10 emmansh

For functions such as lead and lag, the window frame is not supported. Per the SQL specification:

5) If <ntile function>, <lead or lag function>, <rank function type> or ROW_NUMBER 
   is specified, then:
  a) ...
  b) The window framing clause of WDX shall not be present.

The invocation should fail if the window frame is present. Currently, it's being ignored. That's a bug.

martint avatar Oct 10 '24 21:10 martint

@emmansh which version of Trino are you running?

I've written a test to verify whether the window frame is supported for the lag function: link. It shows that the current master behaves as expected, returning the error Cannot specify window frame for lag function. The same applies to the lead function.

However, for functions like rank, no such error is returned - this test fails. This is probably something we can improve.

piotrrzysko avatar Oct 18 '24 08:10 piotrrzysko

Hi, may i ask is this issue still open? can i work on this?

yuma-tietiedaxiang avatar Oct 19 '24 23:10 yuma-tietiedaxiang

Hi @yuma-tietiedaxiang, I started looking at this. However, I'm waiting for a response from @emmansh (see my last comment).

piotrrzysko avatar Oct 20 '24 08:10 piotrrzysko

@piotrrzysko I use Trino via AWS Athena. I'm not sure what Trino version it uses, but it certainly not the latest. See here why it's a problem to figure out the Trino version on AWS Athena.

emmansh avatar Oct 20 '24 08:10 emmansh

I verified this, and indeed, Athena returns the results provided by @emmansh. However, the current version of Trino throws an error for the lag function when used with the window framing clause, which is in accordance with the SQL specification. There is already a test that confirms this, and in #23856, I extended it to verify the lead function. Additionally, I included other functions for which the window framing clause is illegal.

piotrrzysko avatar Oct 21 '24 19:10 piotrrzysko