pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Support generic sliding window aggregations

Open jackjlli opened this issue 3 years ago • 14 comments

Currently in pinot there is no way to query Pinot table in a sliding window fashion. E.g. suppose I want to get the 7-day average number of Covid-19 cases. For now in order to do this, we'd need to send multiple queries; one query for each window.

We would like Pinot to support aggregation functions (like sum, count, avg, T-digest, etc) on sliding window within one single query.

jackjlli avatar Jul 27 '21 05:07 jackjlli

LinkedIn Pinot team is currently working on this feature right now. We'll share the detailed design shortly.

jackjlli avatar Jul 27 '21 05:07 jackjlli

Can you share some high level details about the approach being taken? Is it based aggregation UDF or does it requires changes to pinot core itself?

ashishkf avatar Sep 05 '21 04:09 ashishkf

It will be good to add an interface for sliding window aggregates functions similar to transform and aggregate function.

Once we have that, we can implement moving average etc

kishoreg avatar Sep 05 '21 04:09 kishoreg

Hi all,

Here is the design document for supporting window functions in Pinot: https://docs.google.com/document/d/17qUJl5hL1E8haB9APDsZTYa7askR6n5B0IQqyyZ3dYs/edit?usp=sharing

Please feel free to leave any comments or feedback.

Thanks, Jack

jackjlli avatar Nov 04 '21 02:11 jackjlli

Thanks for putting this together. Very good write up.

High level thoughts

  • can we do match recognize instead of window? It's much more modern and more powerful than window functions and is a superset
  • in the first version, let's make sure we only support functions where local aggregation is possible or the number of records returned by server is limited (this can be configured by the user).. without this, even simple window functions can bring down the server.. we will need this for joins and sub queries as well
  • should we create the query spi layer first to support FULL sql syntax parsing? This Will also allow us to build the window function operator incremental without invasive changes

Overall, this is a great initiative and let's try to think few steps ahead and lay the right foundation

kishoreg avatar Nov 04 '21 03:11 kishoreg

can we do match recognize instead of window?

My understanding is that Window Functions would be applied over the resultset being produced by a query to show additional column aggregations by partitioning the resultset. match_recognize seems to work over the table produced by the FROM clause to output another table that matches certain pattern sequences. So they appear to be different in that sense, some of the internal operators for window function could apply to match_recognize as well. I think one could apply window functions over the output produced by match_recognize?

amrishlal avatar Nov 04 '21 16:11 amrishlal

in the first version, let's make sure we only support functions where local aggregation is possible or the number of records returned by server is limited

Totally agree on that. Window function is not the only feature facing this (e.g. post aggregation in order-by, having clause with LIMIT, etc). Same for future features like joins and sub queries.

should we create the query spi layer first to support FULL sql syntax parsing? This Will also allow us to build the window function operator incremental without invasive changes

Do we know when we aim to finish the query spi layer? If it will take a while, we can first do the implementation on the execution part with the existing way of query compilation and then wire in with the new query spi layer once that's ready.

jackjlli avatar Nov 04 '21 21:11 jackjlli

@jackjlli Weixiang (@weixiangsun) is working on the gapfill function (#7422) which is quite similar to the window function. Both features require reading the results from previous rows. Can you sync up with each other on the design so that we don't duplicate the implementation? I feel a lot of the code can be shared across the 2 features.

Jackie-Jiang avatar Dec 03 '21 01:12 Jackie-Jiang

@kishoreg @mayankshriv @Jackie-Jiang @richardstartin we are planning to start window function implementation soon. Is there anything remaining with respect to design or any other related issue that needs to be discussed before we do so?

amrishlal avatar Dec 16 '21 21:12 amrishlal

@amrishlal

Can you please address the comments from @Jackie-Jiang in the design doc? This is a complex feature and the design doc is mostly describing how Pinot operators work. It will help me a lot if you can expand on the following topics

  • How will the Window Operator work
  • Current proposed solution of pulling all records into a broker is not a scalable solution. Pulling all records into a single broker memory is not going to scale.
  • Without a distributed solution, the response times are going to be orders of magnitude slower than running it via Presto/Trino connector
  • In a typical database, Window operators are built on solid distributed partitioning, sorting primitives, and have some commonalities with primitives needed JOIN. Adding Joins first and then the window function is better in terms of sequencing.
  • Nothing related to the design but the estimates in the design doc gives me a feeling that we are grossly underestimating the complexity here

Overall I am +1 on adding this feature but -1 on the design.

kishoreg avatar Jan 13 '22 20:01 kishoreg

@kishoreg @mayankshriv @Jackie-Jiang @richardstartin

I got a bit sidetracked with other work after our last discussion on window functions, but would like to restart again. We have updated design based on the discussions that we had last time. There is also a fairly mature prototype that I am in process of cleaning up and finalizing. We would like to incorporate this code into Pinot. Please let me know your thoughts and comments.

cc: @siddharthteotia, @jackjlli, @mcvsubbu

amrishlal avatar Aug 15 '22 22:08 amrishlal

Are we planning to implement this on the broker side? I'm okay with that as an initial version, but that won't be the long term solution, and will have a lot of constraints due to the limited resource on a single node. Long term solution should be to implement it on the multi-stage engine. cc @walterddr to also chime in

Jackie-Jiang avatar Aug 16 '22 17:08 Jackie-Jiang

@Jackie-Jiang only the merge will be done on Broker side, the computation of window functions is on the server side. I think @siddharthteotia is going to be setting up time with you and @walterddr to discuss further. We do want to port the design to multi-stage engine as well.

amrishlal avatar Aug 17 '22 20:08 amrishlal

@amrishlal I don't see that in the design doc. If you have created a new one, please share it here. IMO, it is not really possible to compute the window function on the server side without getting all the records first. Essentially broker needs to pull the matching records from server, then do the calculation.

Jackie-Jiang avatar Aug 17 '22 20:08 Jackie-Jiang

I'll be working on this issue

somandal avatar Jan 26 '23 21:01 somandal

Have a design document for phase 1 ready for review: https://docs.google.com/document/d/13CmFm4djI09JKF_Xty5acoXxJoxC9CLXAsmikgzgtIs/edit#heading=h.54yrqayotu6g

somandal avatar Jan 26 '23 21:01 somandal

Initial set of query planning changes have been merged. Implementation for runtime is in progress

siddharthteotia avatar Feb 09 '23 06:02 siddharthteotia

Closing this as the basic window functions are all supported

Jackie-Jiang avatar Jun 07 '24 05:06 Jackie-Jiang