blazingsql icon indicating copy to clipboard operation
blazingsql copied to clipboard

Window Function

Open rsdpyenugula opened this issue 5 years ago • 6 comments

Why we need window functions? When working with big data window functions help to slice the things out like removing the duplicated with rank/row number/dense rank without theses inbuilt functions it's hard /complex to remove duplicates. and here are the few use cases listed below

  1. Running totals within the group's
  2. Athematic calculations like Max, Min, Avg within a group
  3. First Value / Last Vale /Nth Value within a group

Above are a few use cases of the standard window function. How about calculating/data filling on a sliding window(by range/rows) within a group like Sales for last's 7 days from the current day (sliding by range from the current date in the row to the last 7 days within the window).

Describe the solution you'd like For implementation please refer to Hive/Presto/Pandas/Spark SQL/MySQL/Postgres implementations

Additional context Here are a few links which help to get a better understanding of these functions
ref: http://shzhangji.com/blog/2017/09/04/hive-window-and-analytical-functions/ https://medium.com/jbennetcodes/how-to-get-rid-of-loops-and-use-window-functions-in-pandas-or-spark-sql-907f274850e4 https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html https://acadgild.com/blog/windowing-functions-in-hive

Enterprise DB Windows Functions documents
https://docs.snowflake.net/manuals/sql-reference/functions-analytic.html https://docs.aws.amazon.com/redshift/latest/dg/c_Window_functions.html

RDMS DB Window Functions https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html https://www.postgresql.org/docs/9.1/tutorial-window.html

rsdpyenugula avatar Oct 25 '19 04:10 rsdpyenugula

We are in complete agreement that window functions are an important tool in SQL. At this time they are not supported but we should have the subset that is currently supported in CUDF in the next few sprints.

felipeblazing avatar Oct 25 '19 06:10 felipeblazing

Window funtions support on cudf would be great for custreamz. We process micro batches and window function support helps us to run with in batches and across batches. Most common funtion would be groupby (inter & intra micro batches) & perform computations. Use cases mentioned above (#1, #2, #3) would help a lot

satishvarmadandu avatar Jun 01 '20 14:06 satishvarmadandu

Basic info about BlazingSQL architecture: https://docs.google.com/document/d/1VJghfZLWleojxYZBRo6v3CZwzdnfHaLtMJm462TVeOA/edit?usp=sharing

Starting info about how to start implementing Window functions: https://docs.google.com/document/d/1EF0198yxlt3Uu6jJ6Yw_MkQls8gGWDolifjZhBbLXDg/edit?usp=sharing

wmalpica avatar Jul 15 '20 14:07 wmalpica

We should make sure we look at this https://github.com/BlazingDB/blazingsql/pull/878 when implementing this feature

wmalpica avatar Oct 08 '20 16:10 wmalpica

Cross linking several related issues for tracking purposes, as this issue feels like it could be the "catch-all" for various window functions:

  • https://github.com/BlazingDB/blazingsql/issues/1069
  • https://github.com/BlazingDB/blazingsql/issues/1008
  • https://github.com/BlazingDB/blazingsql/issues/905

beckernick avatar Oct 16 '20 21:10 beckernick

Other associated issues: #1478 #1450

wmalpica avatar Apr 16 '21 15:04 wmalpica