blazingsql
blazingsql copied to clipboard
Window Function
Why we need window functions? When working with big data window functions help to slice the things out like removing the duplicated with rank/row number/dense rank without theses inbuilt functions it's hard /complex to remove duplicates. and here are the few use cases listed below
- Running totals within the group's
- Athematic calculations like Max, Min, Avg within a group
- First Value / Last Vale /Nth Value within a group
Above are a few use cases of the standard window function. How about calculating/data filling on a sliding window(by range/rows) within a group like Sales for last's 7 days from the current day (sliding by range from the current date in the row to the last 7 days within the window).
Describe the solution you'd like For implementation please refer to Hive/Presto/Pandas/Spark SQL/MySQL/Postgres implementations
Additional context
Here are a few links which help to get a better understanding of these functions
ref:
http://shzhangji.com/blog/2017/09/04/hive-window-and-analytical-functions/
https://medium.com/jbennetcodes/how-to-get-rid-of-loops-and-use-window-functions-in-pandas-or-spark-sql-907f274850e4
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
https://acadgild.com/blog/windowing-functions-in-hive
Enterprise DB Windows Functions documents
https://docs.snowflake.net/manuals/sql-reference/functions-analytic.html
https://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html
RDMS DB Window Functions https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html https://www.postgresql.org/docs/9.1/tutorial-window.html
We are in complete agreement that window functions are an important tool in SQL. At this time they are not supported but we should have the subset that is currently supported in CUDF in the next few sprints.
Window funtions support on cudf would be great for custreamz. We process micro batches and window function support helps us to run with in batches and across batches. Most common funtion would be groupby (inter & intra micro batches) & perform computations. Use cases mentioned above (#1, #2, #3) would help a lot
Basic info about BlazingSQL architecture: https://docs.google.com/document/d/1VJghfZLWleojxYZBRo6v3CZwzdnfHaLtMJm462TVeOA/edit?usp=sharing
Starting info about how to start implementing Window functions: https://docs.google.com/document/d/1EF0198yxlt3Uu6jJ6Yw_MkQls8gGWDolifjZhBbLXDg/edit?usp=sharing
We should make sure we look at this https://github.com/BlazingDB/blazingsql/pull/878 when implementing this feature
Cross linking several related issues for tracking purposes, as this issue feels like it could be the "catch-all" for various window functions:
- https://github.com/BlazingDB/blazingsql/issues/1069
- https://github.com/BlazingDB/blazingsql/issues/1008
- https://github.com/BlazingDB/blazingsql/issues/905
Other associated issues: #1478 #1450