WikiSort
WikiSort copied to clipboard
Looking to improve Grailsort with ideas from Wiki; help wanted!!
Hello, Bonzai!
I've been quite interested in Wikisort and Block (Merge) Sort in general for a while now. Back in AP Comp Sci, you could say I was a bit floored when I found out a stable, O(1) space, and worst-case O(n log n) time sort actually existed. :P
I've also done a bit of personal studying into sorts, and forked over a decently popular "algorithm animation" program, which I've been working on for quite some time now! In fact, here are some videos I made regarding Wikisort (Feel free to use them!): WikiSort - Bar Graph: https://youtu.be/ciQG_uUG6O4 WikiSort - Scatter Plot: https://youtu.be/mMr4b0Yg4yg WikiSort - Color Circle: https://youtu.be/OWxuXJQ3Guw Wikisorting over 16k Numbers: https://youtu.be/X3SyGvfj1d8
I've been doing a lot of research on and off not only into Wikisort, but also Andrey Astrelin's Grailsort (I saw he talked to you in another post), which turns out to be a bit of a different implementation of Block Merge Sort. What it sacrifices in terms of adaptability in some places (best-case O(n log n) time instead of Wikisort's O(n) best-case), it makes up with raw speed in other cases (I would wager Grailsort's "Build Blocks" function is one of the fastest merge sorts out there).
I've done some visualizations of the sort here; hopefully they demonstrate the major differences between Wiki and Grail: Grail Sorting 2,048 Integers: https://youtu.be/LJbXsV_qGbs GrailSort Redistributing Its Internal Buffer: https://youtu.be/8SV18oH8kPc A Detailed Visual of GrailSort: https://youtu.be/U29NB96Y-9w Grailsorting over 16k numbers: https://youtu.be/kZJ7h307bcQ
I also have some refactored repos for Grailsort. They're rough drafts, but hopefully the code is easier to read than the original: Java Version: https://github.com/MusicTheorist/Grail-Sorting-for-Java C/C++ Version: https://github.com/MusicTheorist/Grail-Sort-Refactored
I'm also looking to create some documentation for both Grail and Block Merge Sort to make them much more intuitive, like what you've done with your docs.
I basically agree with Mr. Astrelin on the major changes between Grail and Wiki: "[Grail] mainly differs by swapping blocks and their tags in parallel before merging, and shifting the position of an internal buffer used for locally merging/appending portions of an array".
While Grailsort is a great algorithm, I do see some significant room for improvement, and I thought I would ask you for some help if you're interested! There are definitely aspects of Wikisort I think Grailsort would benefit from, and hopefully a conversation would help to make sure I understand Blocksort alright!
My questions for now would be:
- What do you consider the fastest/most efficient step(s) in Wikisort? What about the slowest/least efficient?
- I saw one of your closed issues asking if a backwards merge was possible. Is it, and even if it is, would it possibly break stability?
- What was your strategy for getting Wikisort to be adaptive?
- Do you have any updates on why Java's JIT compiler slows down Wikisort?
- Are there any important edge cases to watch out for when reordering/swapping A and B blocks?
- What's the complexity of your "extract keys" method? Grailsort's "find keys" method alone is O(n log n) comparisons in the worst-case, i.e. when there are less than 2 * sqrt(n) unique keys in the array.
Feel free to answer whenever and however detailed you want. Hope you're staying well during the pandemic!
Thanks for reading!!
- John
John! I feel like I'm slightly late in responding to this, and I know we covered some of the questions on the Discord, but I'll finally attempt to form a coherent response on here.
-
Every step of WikiSort is as efficient as I was personally able to make it, so you'd have to ask someone else. :)
-
I think I was just excited at the time from remembering that backwards merging is a thing that exists, so I created an issue about it. I'm trying to remember what backwards merge even is though (it's been a while) – is that when B is swapped over to the internal buffer instead of A? That would imply that merging two sorted arrays where one is of length 1000 and the other is of length 1 would only need a swap array containing 1 item to handle the merge rather than 1000, since we'd only be using the first B.length values from the internal buffer rather than the entire buffer. I suppose that would save (A.length - B.length) swaps if such a thing were possible.
-
We covered this on the Discord, but WikiSort only pulls out the internal buffer to the start or end of the array and keeps it there during the entire merge process, so when comparing A and B blocks or subarrays it's trivial to determine whether they are already sorted. That's all I did. My understanding is that Grailsort makes this much harder to implement due to the way it rolls its internal buffer through the array.
-
I have no idea why the Java version is randomly so slow and I wish I did. I haven't touched the Java version since I wrote it though.
-
Hm I'm not sure what you mean by this. The algorithm should handle any required edge cases, and the docs should cover it. As long as they're smaller than the internal buffer and are rolled into position correctly everything should work fine.
-
WikiSort extracts a buffer once per level of the merge, due to the inherent difficulty of being able to find √(array.length) unique values at the smallest levels of the merge, so since there are log(n) levels I'd have to imagine it isn't allowed to be any slower than O(n), right? From what I remember finding unique values is a linear search and rotating them to the start of the array has an overlap of up to √n values, so just thinking out loud it's probably O(n + √n) which is O(n). Can't guarantee that's 100% accurate though.
By the way it's good to remember that algorithmic simplicity is sometimes more important than absolute optimal running time. Back in 2014-2015 some places were considering WikiSort for internal uses but it's already a pretty unwieldy algorithm with many edge cases (and maybe even hacks) to achieve its current run time. Adding more layers of edge cases to shave some swaps here and there would just make it even less likely to be adopted, even if logically it would be a more efficient algorithm overall. People just want a reasonable guarantee that they (or someone else) won't accidentally break the algorithm in the future and have no idea how to fix it.
Grailsort by comparison is pretty clean and short – although I really wish it had some solid documentation!
- Mike
It's on the way, and you're helping us out a ton!! :)