databend icon indicating copy to clipboard operation
databend copied to clipboard

Feature: Update sort algorithm using Loser Tree

Open Jake-00 opened this issue 2 years ago • 3 comments

Summary

Currently Databend's sort algorithm using MergSort based on heap. The main algorithms of multiple merging include heap sort, winner tree and loser tree. In these three algorithms, each heap adjustment of heap sorting requires comparison with the left and right child nodes, and the comparison times is 2logN, while the comparison times of winner tree and loser tree adjustment are logN. The difference is that the winner tree needs to compare with sibling nodes and update the parent node, while the loser tree only needs to compare with the parent node, and the memory access times are less.

Jake-00 avatar May 29 '23 02:05 Jake-00

I'm genuinely interested in this task, and if it's possible, I would be honored to take on the responsibility of completing it.

ct20000901 avatar Aug 15 '23 19:08 ct20000901

I wrote a loser_tree implementation, but I can't integrate it into src/query/pipeline/transforms/src/processors/transforms/sort/merger.rs.

https://github.com/forsaken628/databend/blob/c14c3085b7ba2e3841eeef0af47ab2935902463b/src/query/pipeline/transforms/src/processors/transforms/sort/loser_tree.rs

  • Changing the length (push, pop) requires rebuilding the entire tree, so the current implementation doesn't support push, and pop just flags it.
  • peek_top2 will be slower than find_bigger_child_of_root.

cc: @RinChanNOWWW

forsaken628 avatar Jun 21 '24 02:06 forsaken628

IMO, you can add a new trait named Merger(or other names like MergeAlgorithm) and let Heap and LoserTree implement the trait. The methods of the trait can be the public methods of the current HeapMerger. I think this is a more easy way to introduce your lose tree to the sort operator.

RinChanNOWWW avatar Jun 21 '24 13:06 RinChanNOWWW