pinot
pinot copied to clipboard
Support generic sliding window aggregations
Currently in pinot there is no way to query Pinot table in a sliding window fashion. E.g. suppose I want to get the 7-day average number of Covid-19 cases. For now in order to do this, we'd need to send multiple queries; one query for each window.
We would like Pinot to support aggregation functions (like sum, count, avg, T-digest, etc) on sliding window within one single query.
LinkedIn Pinot team is currently working on this feature right now. We'll share the detailed design shortly.
Can you share some high level details about the approach being taken? Is it based aggregation UDF or does it requires changes to pinot core itself?
It will be good to add an interface for sliding window aggregates functions similar to transform and aggregate function.
Once we have that, we can implement moving average etc
Hi all,
Here is the design document for supporting window functions in Pinot: https://docs.google.com/document/d/17qUJl5hL1E8haB9APDsZTYa7askR6n5B0IQqyyZ3dYs/edit?usp=sharing
Please feel free to leave any comments or feedback.
Thanks, Jack
Thanks for putting this together. Very good write up.
High level thoughts
- can we do match recognize instead of window? It's much more modern and more powerful than window functions and is a superset
- in the first version, let's make sure we only support functions where local aggregation is possible or the number of records returned by server is limited (this can be configured by the user).. without this, even simple window functions can bring down the server.. we will need this for joins and sub queries as well
- should we create the query spi layer first to support FULL sql syntax parsing? This Will also allow us to build the window function operator incremental without invasive changes
Overall, this is a great initiative and let's try to think few steps ahead and lay the right foundation
can we do match recognize instead of window?
My understanding is that Window Functions would be applied over the resultset being produced by a query to show additional column aggregations by partitioning the resultset. match_recognize seems to work over the table produced by the FROM clause to output another table that matches certain pattern sequences. So they appear to be different in that sense, some of the internal operators for window function could apply to match_recognize as well. I think one could apply window functions over the output produced by match_recognize?
in the first version, let's make sure we only support functions where local aggregation is possible or the number of records returned by server is limited
Totally agree on that. Window function is not the only feature facing this (e.g. post aggregation in order-by, having clause with LIMIT, etc). Same for future features like joins and sub queries.
should we create the query spi layer first to support FULL sql syntax parsing? This Will also allow us to build the window function operator incremental without invasive changes
Do we know when we aim to finish the query spi layer? If it will take a while, we can first do the implementation on the execution part with the existing way of query compilation and then wire in with the new query spi layer once that's ready.
@jackjlli Weixiang (@weixiangsun) is working on the gapfill function (#7422) which is quite similar to the window function. Both features require reading the results from previous rows. Can you sync up with each other on the design so that we don't duplicate the implementation? I feel a lot of the code can be shared across the 2 features.
@kishoreg @mayankshriv @Jackie-Jiang @richardstartin we are planning to start window function implementation soon. Is there anything remaining with respect to design or any other related issue that needs to be discussed before we do so?
@amrishlal
Can you please address the comments from @Jackie-Jiang in the design doc? This is a complex feature and the design doc is mostly describing how Pinot operators work. It will help me a lot if you can expand on the following topics
- How will the Window Operator work
- Current proposed solution of pulling all records into a broker is not a scalable solution. Pulling all records into a single broker memory is not going to scale.
- Without a distributed solution, the response times are going to be orders of magnitude slower than running it via Presto/Trino connector
- In a typical database, Window operators are built on solid distributed partitioning, sorting primitives, and have some commonalities with primitives needed JOIN. Adding Joins first and then the window function is better in terms of sequencing.
- Nothing related to the design but the estimates in the design doc gives me a feeling that we are grossly underestimating the complexity here
Overall I am +1 on adding this feature but -1 on the design.
@kishoreg @mayankshriv @Jackie-Jiang @richardstartin
I got a bit sidetracked with other work after our last discussion on window functions, but would like to restart again. We have updated design based on the discussions that we had last time. There is also a fairly mature prototype that I am in process of cleaning up and finalizing. We would like to incorporate this code into Pinot. Please let me know your thoughts and comments.
cc: @siddharthteotia, @jackjlli, @mcvsubbu
Are we planning to implement this on the broker side? I'm okay with that as an initial version, but that won't be the long term solution, and will have a lot of constraints due to the limited resource on a single node. Long term solution should be to implement it on the multi-stage engine. cc @walterddr to also chime in
@Jackie-Jiang only the merge will be done on Broker side, the computation of window functions is on the server side. I think @siddharthteotia is going to be setting up time with you and @walterddr to discuss further. We do want to port the design to multi-stage engine as well.
@amrishlal I don't see that in the design doc. If you have created a new one, please share it here. IMO, it is not really possible to compute the window function on the server side without getting all the records first. Essentially broker needs to pull the matching records from server, then do the calculation.
I'll be working on this issue
Have a design document for phase 1 ready for review: https://docs.google.com/document/d/13CmFm4djI09JKF_Xty5acoXxJoxC9CLXAsmikgzgtIs/edit#heading=h.54yrqayotu6g
Initial set of query planning changes have been merged. Implementation for runtime is in progress
Closing this as the basic window functions are all supported