cpp-sort icon indicating copy to clipboard operation
cpp-sort copied to clipboard

Can you adapt JesseSort?

Open lewj85 opened this issue 9 months ago • 4 comments

It's faster than Powersort. Python and paper here: https://github.com/lewj85/jessesort

lewj85 avatar Feb 13 '25 16:02 lewj85

I haven't heard of it. I will look into it when I have tile, which may take a few weeks.

Morwenn avatar Feb 16 '25 16:02 Morwenn

Hi again, I'm very sorry I took so long to answer and the weeks turned to months x.x

I'm just starting to read the paper again, I'm barely at the description of rainbows and insertion, and it reminds me a lot of encroaching lists, introduced by Skiena in Encroaching Lists As A Measure Of Presortedness.

Skiena then introduces a sorting algorithm called melsort that builds such encroaching lists to form something that looks very much like your rainbow, then merges them to get a sorted collection.

I have a fairly unoptimized implementation of melsort in this very project, and a measure of presortedness Enc that computed the number of encroaching lists that are built by a linear scan over a sequence of elements.

I think that the same idea is also used in the paper Patience is a Virtue: Revisiting Merge and Sort on Modern Processors by Chandramouli and Goldstein, a paper you can read for free on Microsoft Research. They start with patience sort and add elements to both the top and bottom of piles if I remelber correctly, which likely corresponds to encroaching lists, and to your rainbow.

I'm gonna resume reading your paper now. So far I'm very early in it, sorry if the observations above are not accurate.

Morwenn avatar Jun 27 '25 14:06 Morwenn

I'm done reading your article, and I am now fairly convinced that your rainbow data structure and Skiena's encroaching list set are are the same from a functional point of view (he also compares them to Young Tableaux and uses the similarities for proofs, but I don't know anything about those). If I'm not mistaken, you both have similar conclusions about some properties of the structure such as the expected number of lists/bands. I believe that Skiena's paper has some information that your paper lacks (for example the irony of n/2 lanes being optimal in its own way), but the reverse is also true. For example I can't find that $\sqrt{8}\sqrt{n}$ guess anywhere in Skiena's work, your heuristic of using insertion point of the previously inserted element also feels new - you might want to read Skiena's paper to draw a consolidated picture of the data structure and make use of the best of both worlds.

Now Skiena deals with lists for simplicity, and does not make any effort to put elements into contiguous memory, which differs a lot from your proposed rainbow and split rainbow proposal in that regard. Similarly his paper doesn't talk about merging a lot and only proposes pairwise merging, I assume that it's also for simplicity, but research about optimal merging was also very lacking in 1988 compared to what it is today. On the other hand I remember that the Patience is a virtue paper talks a lot about merging, but I don't think I ever gave it a proper read. It's older than powersort too, so such an optimal runs merge strategy likely didn't exist at the time it was written.

I'm gonna keep comparing the three papers a bit if I find time and motivation. I don't plan to change my melsort implementation to be non-list-based (because I actually like linked list and spent some amount of time to optimize my implementation in cute ways). but I might try to reimplement JesseSort and compare both - I do expect JesseSort to win, be it only because it actually cares about memory locality.

Good job either way 😄

Morwenn avatar Jun 28 '25 11:06 Morwenn

Also another information that you might find interesting: building on melsort, C. Levcopoulos and O. Petersson proposed another sorting algorithms called slabsort in their paper Sorting Shuffled Monotone Sequences. I also have an implementation, but I haven't really tried hard to make it fast, though it's highly adaptive, which is pretty impressive.

Slabsort tries to call melsort at different points, and gives up when the number of encroaching lists gets too big, to attempt a different strategy, then tries to call melsort again. I think that the algorithm would work just as well by changing melsort with JesseSort, which might actually be an interesting experiment to benchmark how JesseSort compares as part of another algorithm.

Morwenn avatar Jun 28 '25 11:06 Morwenn