malloy icon indicating copy to clipboard operation
malloy copied to clipboard

Window Functions

Open anikaks opened this issue 3 years ago • 5 comments

Add support for window functions. A few we'd probably like to support:

  • LAG / LEAD : "how long did it take for User to get from step A to step B?" "How long are people typically idle after they do x action?" "What was the previous status of this item?"
  • FIRST_VALUE / LAST_VALUE / NTH_VALUE - important for event/session data. "what was the first action User took? What was the first/last action or landing/exit page in User's session?"; grabbing the "first" ever referrer for a user across many sessions, filling in each row of a historical state table from EAV data. etc
  • RANK / ROW_NUMBER - often for sequencing things like transactions "What percent of purchases come from first-time vs newish vs seasoned customers?" or just for generating a primary key where there isn't one, or even for funnel analysis / finding given value in a partition
  • Percentile functions (including MEDIAN), helpful for visualizing the data and seeing the distribution

anikaks avatar Sep 28 '21 23:09 anikaks

We had a request from someone in Slack wanting to do a rolling average today. Throwing in an approach that works for now in case others are searching for this:

query: rolling_average_line_chart is table('malloy-data.ecomm.order_items'){
  primary_key: id
} 
-> {
  nest: orders_1 is {
    group_by: order_date is created_at.day 
    aggregate: total_sales is sale_price.sum()
  }
  nest: orders_2 is {
    group_by: order_date is created_at.day 
    aggregate: total_sales is sale_price.sum()
  }
} 
-> {
  where: 
    orders_2.order_date < orders_1.order_date,
    orders_2.order_date >= orders_1.order_date - 7 days
  project: order_date is orders_1.order_date
  project: total_sales_2 is  orders_2.total_sales
}
-> {
  group_by: order_date
  aggregate: rolling_average is total_sales_2.sum() / 7
}

but this will be muuuuch nicer when we can support window functions and/or if we provide a smooth way to invoke a rolling average.

anikaks avatar Mar 30 '22 22:03 anikaks

Documenting another example; this gets you FIRST_VALUE / LAST_VALUE. To figure out the first product each user ever ordered:

query: user_first_product is order_items -> {
  group_by: user_id
  nest: first_product is {
    group_by: 
      product_name is inventory_items.product_name
      created_at
    order_by: created_at asc
    limit: 1
  }
} -> {
  project: user_id, first_product.product_name
}

anikaks avatar Apr 28 '22 00:04 anikaks

is this feature (enabling window functions) being worked upon? 🤔

nandubatchu avatar Jun 12 '23 20:06 nandubatchu

Yes. It lands in pre-release tomorrow.

On Mon, Jun 12, 2023, 1:10 PM Yadunandan Batchu @.***> wrote:

is this feature (enabling window functions) being worked upon? 🤔

— Reply to this email directly, view it on GitHub https://github.com/malloydata/malloy/issues/46#issuecomment-1588018991, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIK6UUZEVM3OLYUBOTC5JTXK5ZRVANCNFSM5E6QZJDA . You are receiving this because you were assigned.Message ID: @.***>

lloydtabb avatar Jun 12 '23 21:06 lloydtabb

Is supporting PARTITION BY within the calculate analytic/window function on the round map?

gavin-tynan-beamery avatar Dec 05 '23 11:12 gavin-tynan-beamery