velox
velox copied to clipboard
Implement window functions in Velox
Window functions https://prestodb.io/docs/current/functions/window.html#functions-window are a special category of analytic functions in SQL.
These are heavily used in TPC-DS benchmark and also critical to other client SQL workloads.
This issue tracks their implementation in Velox.
Efficient Processing of Window Functions in Analytical SQL Queries VLDB paper: http://www.vldb.org/pvldb/vol8/p1058-leis.pdf
Window functions in DuckDB: https://duckdb.org/docs/sql/window_functions
CC: @beroyfb @pedroerp
PR #1246 PR #1692 PR #1691
Doc https://docs.google.com/document/d/1YiBv7tuWK8lqXTXcgyTWkLYFEA1Rama8wrifdL5nNFw/edit has a description of the functional requirements for Window functions and high-level design thinking.
There are many TODO's in the doc that are being actively worked on.
PR #1695
Per offline discussion with @aditi-pandit , the plan is:
- Add support for row_number, rank and nth_value functions. This requires changes to the WindowFunction interface to pass channel indices of the inputs and calculation of peers in the operator.
- Add integration with Presto.
This will give us first milestone with 3 functions working end to end with default frame.
Next steps would be (1) cut up PRs and get these reviewed and landed (2) do performance evaluation and identify bottlenecks (3) optimize based on results from (2) (4) implement frames (5) add support for any aggregate function to be used in a window operator (6) add more window functions.
Items 3, 4, 5 can be worked on in parallel. Item 6 has a dependency on 4.
PR https://github.com/facebookincubator/velox/pull/1816
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Assuming this is still in progress - re-opening it.
Functional work pending: i) Support for k preceding/k following rows/range frames. ii) Functions first_value, last_value iii) Functions lead, lag
Performance optimizations WIP Spill to disk WIP
Hello @aditi-pandit ,
I saw is ignoreNulls defined https://github.com/facebookincubator/velox/blob/d0e1acfddfb1918957a0aceb215dbc01acecd82d/velox/core/PlanNode.h#L1876 but not used anywhere. Do you have any plan to support "IGNORE NULLS" for value functions like nth_value etc?
Thanks.
Yes, this needs to be handled.
@aditi-pandit Aditi, is anyone working on adding lead and lag functions? If not, I'm going to add these.
Hi @mbasmanova,
There are some PRs in progress for lead/lag https://github.com/facebookincubator/velox/pull/2942. We are planning to prepare for submission.
Hello @aditi-pandit , I saw is
ignoreNullsdefinedhttps://github.com/facebookincubator/velox/blob/d0e1acfddfb1918957a0aceb215dbc01acecd82d/velox/core/PlanNode.h#L1876
but not used anywhere. Do you have any plan to support "IGNORE NULLS" for value functions like
nth_valueetc? Thanks. Yes, this needs to be handled.
Hi @zhli1142015, Thanks for pointing that out. Yes, they need to be handled. There are plans to support them.
@aditi-pandit
There are some PRs in progress for lead/lag https://github.com/facebookincubator/velox/pull/2942.
Thank you for the pointer. The PR is very old though and doesn't seem to be actively worked on. What is the ETA for it to be ready for review?
@mbasmanova : Pramod is back from his vacation end of this week. He will pick it up after that.
Closing this issue since https://github.com/facebookincubator/velox/issues/5411 is the only functionality of Presto window not available in Velox now.