dao icon indicating copy to clipboard operation
dao copied to clipboard

Add a stable sorting comparison algorithm

Open dumblob opened this issue 11 years ago • 9 comments
trafficstars

Recently I stumbled upon an utter need for stable sorting. Currently we use sort( self: list<@T>, order: enum<ascend,descend> = $ascend, part = 0 ) => list<@T> and sort( self: list<@T>, part = 0 )[X: @T, Y: @T => int] => list<@T>. We have basically two options:

  1. enhance the interface with stable = false argument and add some fast stable algorithm with memory consumption O(1) .
  2. do not change the interface and exchange the quicksort with some stable algorithm

I've looked around and I can see two candidates for a stable and fast algorithm (almost comparable to quicksort when implemented and used for general data structures like Dao code sections):

  • Timsort (used in Python), which tries to minimize the number of comparisons, but juggles on the other hand a bit more with numbers, pointers etc.
  • Block sort, which has all the best properties, but when implemented, it might have a bit bigger constant overhead. There are two known implementations https://github.com/BonzaiThePenguin/WikiSort/tree/master and https://github.com/Mrrl/GrailSort and I personally like the GrailSort one (it's quite minimalistic and transparent compared to other Block sort implementations and has high performance without any buffers). A slightly biased (read "not based on measurements") opinion on WikiSort gives the author of Timsort.

Btw there is also Introsort, which is a "better" quicksort, because it preserves most of the quicksort attributes (like being favourable to caches) and guarantees n*log(n) in worst case time complexity and log(n) worst case of space complexity. But it's not stable.

dumblob avatar Oct 29 '14 10:10 dumblob

A stable sorting algorithm would be indeed better as the default sorting algorithm. But I prefer to avoid sophisticated sorting algorithms for simplicity. Also, considering that there is a constant memory overhead for each Dao object, and a constant overhead for comparisons (especially those invoking of code sections), I think we can afford to use simpler algorithms that use more memory or involve more copying or swapping of pointers.

So I feel a modified quicksort could be a reasonable choice, such modified quicksort can compare the data indices when their value equal. The indices will take 4 or 8 bytes extra memory per data object, which sounds quick a bit, but if you consider the memory overhead for Dao data objects (e.g., 8 byte headers for boxed ints, floats and strings, 20 or 24 byte headers per boxed tuple, and more for user defined types), that is not really much. So such modified quicksort might be my first choice, since it is very simple to implement and is trivial to support stable and unstable sort in a single implementation.

Another choice could be a modified merge sort that uses a small amount of auxiliary space, but involves more copying.

daokoder avatar Nov 05 '14 11:11 daokoder

So I feel a modified quicksort could be a reasonable choice ...

Even then, I'm pretty sure that stable quicksort is slower than Timsort :(

I'm somehow against the modified merge sort as I currently used tuples in sorting and according to their size, it would very quickly degrade (even for small number of items like few hundreds) the overall performance of merge sort which itself has worse performance on average (always logarithmic) compared to other sorting algorithms with lower bound being linear (like the Block sort) or logarithmic, but smaller on average.

dumblob avatar Nov 05 '14 17:11 dumblob

Even then, I'm pretty sure that stable quicksort is slower than Timsort :(

Much slower or slightly slow? Different time complexity (I cannot view the formulas on the wikipedia page, because the images are blocked from here)? Or is it just more efficient for partially sorted list (this seems to be case from the description)?

daokoder avatar Nov 05 '14 18:11 daokoder

Timsort seems to be slightly slower (say 85% of the speed) than stable quicksort for fully random data with small variance. For all other data Timsort is faster and also guarantees O(n_log(n)). A nice explanation of it's architectural ideas gives Volodymyr Korniichuk. One of the main ideas is, that sorting is in most cases performed on data which you were sorting already in the past, but you made some modifications to them. Btw quicksort is very bad in these common cases (it has only O(n_log(n)) in the best case).

Anyway, I've seen some comparison charts of sorting algorithms and the difference between unstable Quicksort/Introsort/Shellsort and stable algorithms was quite large (even cca 5x), so I'm still thinking about providing an option in the interface and implementing two algorithms (I'd choose for unstable one Introsort or Shellsort with Marcin Ciura's sequence and for stable one Timsort). Introsort and Shellsort are very easy to implement. Full Timsort (the one used in Python) is not easy to implement - the difficulty is quite comparable to GrailSort.

I'm quite disappointed by the research in this area as I haven't found any standardized sets of testing data nor any current article with proper measurements using existing proven implementations of different algorithms. It's very hard to choose such algorithm then :( For example, the almost abandoned heap sort got faster in the last years, because we have more registers in common CPUs and heap sort code is efficiently compiled to tight loops with constants and numbers in these registers and it makes a huge difference in measurements. This means, that the measurements of different algorithms also change in time as the compilers get smarter and HW get's more efficient (caches etc.) and as such it's not true what one can find in many papers only 5 years old :(

dumblob avatar Nov 05 '14 19:11 dumblob

I found out that python has quite reasonable test suite for sort algorithms. It's mentioned in an article describing Timsort which written by the author itself.

dumblob avatar Nov 05 '14 21:11 dumblob

Just for curiosity, Julia has also a Timsort implementation (but without some useful optimizations used in the genuine Python implementation) written in a very well readable way https://github.com/JuliaLang/SortingAlgorithms.jl/blob/master/src/SortingAlgorithms.jl and it's license is MIT Expat.

dumblob avatar Nov 06 '14 14:11 dumblob

It seems research in the field of unstable sorting algorithms is still ongoing. The state of the art unstable sort for parallel systems (which Dao with kind of is :wink:) seems to be IPS4o. It's though quite complex.

If we sacrifice the (otherwise very important :cry:) parallelism property, we still end up with TimSort for stable sorting algorithms (TimSort got in the meantime way more widespread than before - it's the default in Apple Swift, in Java, etc.).

But if we insist, we don't need fast stable sort, we could still exchange the current QuickSort with an improved, but faster pdqsort which except for other optimizations uses the same trick as BlockQuicksort and is in the end about 45% faster than a tuned quicksort according to Rust pdqsort implementation documentation.

dumblob avatar Jul 06 '19 18:07 dumblob

Hello! I was browsing GitHub, and came across this issue after curiously searching up "Grailsort". I know this issue hasn't been updated in a little bit, but I still thought I would leave this here...

I'm the manager of an open source project dedicated to researching, rewriting, and optimizing Grailsort. Our work is featured in our repo right over here: https://github.com/MusicTheorist/Rewritten-Grailsort

Please feel free to ask us any questions you'd like! It's an incredibly deep and underrated algorithm.

MusicTheorist avatar Jan 08 '21 06:01 MusicTheorist

@MusicTheorist thanks! That sounds like a valuable initiative. I hope something measurable will come out of it.

2020 seems to have brought some new wind into the topic of sorting. There is e.g. Quadsort - a tight competitor to Grailsort.

But now it comes - a wisely implemented Bubble sort for small arrays is the fastest on ILP CPUs (i.e. all CPUs since 2000 including vast majority of embedded ones). When combined with Lomuto and Hoare partitioning schemes for Quick sort, we have a clear speed winner of "all time" - see Hoare’s Rebuttal and Bubble Sort’s Comeback and the corresponding repo https://github.com/gerben-s/quicksort-blog-post .

I'll copy over this post to the "Rewritten-Grailsort" repo, to discuss it there and not here.

dumblob avatar Jan 08 '21 12:01 dumblob