mongo icon indicating copy to clipboard operation
mongo copied to clipboard

SERVER-7463 New $percentile accumulator

Open mehdiabolfathi opened this issue 5 years ago • 8 comments

Dear MongoDB team, We are using MongoDB for data collections from Gateways, Routers and CPEs. Most of our requirements for memory-efficient and reliable accumulations are satisfied by using MongoDB Aggregation API. At the moment, we calculate the percentile of the data outside of the MongoDB platform. Having big data set and reading this data out of mongodb just for percentile is not optimal.

To optimise some of our use cases in calculating the “percentile”, we tried to implement a built-in accumulator for MongoDB using well-known t-digest algorithm (https://github.com/tdunning/t-digest) which is introducing an on-line approach with constant memory bound and constant accuracy. The possibility of merging independent digests enables the “Sharded” setups to get benefit of distributed accumulations.

This is also motivated by current open issues in MongoDB: https://jira.mongodb.org/browse/SERVER-4929 https://jira.mongodb.org/browse/SERVER-7463

In current proposal, we used Folly implementation of t-digest (https://github.com/facebook/folly/blob/master/folly/stats/TDigest.h) which we merged with current Group Aggregation Accumulators as a new “$percentile” accumulator.

In this implementation, unlike the other accumulators, we pass the necessary parameters (percentile value and digest size) to percentile accumulator. For e.g.:

db.mycollection.aggregate({ "$group": { _id: "$metadata", "my_percentile_result": { "$percentile": { "percentile": 0.95, "value": "$jitter", "digest_size": 1000 } } } })

We had the challenge in passing the percentile parameters to accumulator processor. We tried to solve this by passing the necessary parameters together with the documents to the accumulator processor. The proposal includes new unittests for the percentile accumulator and will be followed by appropriate documentations after your initial reviews.

We are looking forward to your feedback on the proposed feature.

mehdiabolfathi avatar May 22 '19 11:05 mehdiabolfathi

Thanks for taking the time to create a pull request. Please take a look at our Contributor Guide since there are some preliminary steps to take before we can consider your pull request, like signing the contributor's agreement and creating a SERVER ticket.

Thanks, Danny

dhatcher42 avatar May 22 '19 15:05 dhatcher42

Actually, as you mentioned, as this is exactly the feature described in SERVER-7463 we can use that ticket to track the work here so don't worry about the JIRA items. However, we will need you to sign the contributor's agreement before we move forward.

dhatcher42 avatar May 22 '19 17:05 dhatcher42

Thank you for you work here! I'm excited to see that you solved the problem of passing arguments to accumulators. I haven't looked at the implementation yet. The query team is quite busy at the moment wrapping up some projects but are interested with working with you here to review this and get it merged. With any new feature like this there will need to be some consensus that the syntax is straightforward and consistent. Yours looks reasonable but being the first accumulator to do such a thing it's worth us taking some time to think about. Stay tuned for updates, and do watch SERVER-7463 as Danny suggested.

cswanson310 avatar May 22 '19 17:05 cswanson310

@cswanson310 and @dhatcher42 , for your info the "contributor's agreement" is signed.

mehdiabolfathi avatar May 23 '19 07:05 mehdiabolfathi

Hi, what is current state ? Is there anything we can do to support you ?

stephan-hof avatar Sep 25 '19 14:09 stephan-hof

Hi there, any progress by any chance? @cswanson310

mangotree3 avatar Apr 16 '20 02:04 mangotree3

Hi there, we would be very interested in this feature too. Has there been any progress on this PR? Thanks for your efforts. @cswanson310 @dhatcher42 @stephan-hof @mehdiabolfathi

paniterka avatar Jan 18 '21 12:01 paniterka

I'm surprised there's been a PR for this for a couple of years now. Seems it is very much outdated at this point but would be nice to see this merged.

Stephane-Ag avatar Aug 22 '22 15:08 Stephane-Ag