pinot icon indicating copy to clipboard operation
pinot copied to clipboard

return pre-sorted value on IntermediateResultsBlock

Open walterddr opened this issue 3 years ago • 1 comments

Currently, if a query is run with the ORDER BY clause. data is

  • in selection only, data is sorted and trimmed to the limit
  • in aggregation/group-by/distinct, data is unsorted but trimmed to the limit

This means that broker needs to do a full sorting algorithm. Where as if the data is pre-sorted in IntermediateResultsBlock. broke can do a simple k-sorted merge algorithm. which reduces order by time from O(N*logN) to O(N * logK) where K is number of data tables returned.

The PRO for this is:

  • reduce the time complexity of the broker reduce
  • reduce the space complexity so broker only need to maintain a top-K instead of a top-N heap

The CON

  • servers need to pop data in O(N * logN) instead of O(N) loop through the priority queue.
  • implement sorting for agg/group-by/distinct as well.

walterddr avatar Aug 29 '22 14:08 walterddr

Also, right now brokers temporally store the data from servers in a hash map, which implies to hash is twice (once when it is being read from the server and another one when data from different servers is combined). By knowing that the data is sorted, brokers could simply store it in a list. The same happens when servers combine results from different segments.

gortiz avatar Aug 30 '22 05:08 gortiz