pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Add approximate grouping algorithm in multi-stage

Open gortiz opened this issue 1 year ago • 0 comments

In single-stage query engine, Apache Pinot implements an approximate grouping algorithm, which is explained in its own documentation page.

The key point in this algorithm is that when we group by, we do not return complete results from each segment but a max amount of groups (which is 5000 by default). By doing so, Pinot reduces the amount of data it needs to keep in memory and in most cases this is good enough to produce results that are close enough to the exact result.

This heuristic is not applied in multi-stage engine. We should support:

  • In the short term:
    • Indicate in the documentation that this heuristic is not applied in multi-stage.
  • In medium term:
    • Support this heuristic in multi-stage (either by default or opting in explicitly).
    • Have a way to disable/enable this heuristic

gortiz avatar Jun 19 '24 10:06 gortiz