return pre-sorted value on IntermediateResultsBlock
Currently, if a query is run with the ORDER BY clause. data is
- in selection only, data is sorted and trimmed to the limit
- in aggregation/group-by/distinct, data is unsorted but trimmed to the limit
This means that broker needs to do a full sorting algorithm. Where as if the data is pre-sorted in IntermediateResultsBlock. broke can do a simple k-sorted merge algorithm. which reduces order by time from O(N*logN) to O(N * logK) where K is number of data tables returned.
The PRO for this is:
- reduce the time complexity of the broker reduce
- reduce the space complexity so broker only need to maintain a top-K instead of a top-N heap
The CON
- servers need to pop data in O(N * logN) instead of O(N) loop through the priority queue.
- implement sorting for agg/group-by/distinct as well.
Also, right now brokers temporally store the data from servers in a hash map, which implies to hash is twice (once when it is being read from the server and another one when data from different servers is combined). By knowing that the data is sorted, brokers could simply store it in a list. The same happens when servers combine results from different segments.