VxSort
VxSort copied to clipboard
Remove BitonicSort15V/BitonicSort16V?
BitonicSort<T>.Sort(...) is only called in HybridSort when the length is less than SMALL_SORT_THRESHOLD_ELEMENTS = 112 so it never uses the larges of the BitonicSortxxV methods (i.e. 15 & 16).
It is called in CopyAndSortWithBitonic which uses a larger threshold of BitonicSort<T>.MaxBitonicSortSize = 128 but this could just be shrunk for what is really just a small use case.
Anyway, once again, I understand the whole thing is a bit of a WIP, but I always find I never get back to these things if I don't do it sooner rather than later! (Well this Issue will remind you anyway :-)
Thanks for looking and interacting about the internals :) I'm really looking forward to these sorts of discussions...
That's a good discussion to be had, both in this repo and even more so as part of a future PR.
The reasoning for why this asymmetry even exists is that I wanted to make sure I'm offering a considerable speedup on every reasonable size of sorting.
This is why the CopyAndSortWithBitonic exists, to begin with.
The reasoning was that extending this all the way to 16V for CopyAndSortWithBitonic specifically makes more sense from a speed perspective than a single partitioning call + 2x Bitonic calls for smaller sizes.
The downside is that the amount of generated native code for the large bitonic sort cases is prohibitive.
I think that I to figure out why VxSort even needs to exists as a nuget/project.
For the time being, it's sole reason to exist is to get bug reports, and provide a rapid env. to push more 32 bit (unsigned/float) and 64-bit variants.
Once a PR will be submitted for CoreCLR, I presume that the version we end up shipping (if any) as part of CoreCLR will be very limited due to code size considerations.
I wouldn't be surprised if unrolling, for example, will be limited on that version to 4x, and BitonicSort will probably be limited to 8V, just to keep the total code size (esp. when considering multiple versions for 32 bits and 64 bits are required).
I guess that what I'm trying to say is that going forward, VxSort, if it continues to exist outside of CoreCLR + blog posts, can only have one reason to exist: If CoreCLR ends up shipping a reduced, e.g. 7x speed-up version, mostly due to code-size limitations, VxSort can live on, externally as the extreme 11x/12x speed-up version for those who actually need it.
In other words, I think your comment/suggestion, going forward, is the sort of thing I would immediately recognize/accept as true for the future PR/CoreCLR version and makes less sense for an "extreme-all-limitation-are-off" VxSort.
Thoughts?
I think that's all pretty reasonable!
Ideally I think it would be best if .net had a plug in sort, so that it could be specified via some sort of configuration setting what the standard library exposed. (i.e. which would then be used under the hood by any code that calls Array.Sort, Linq's OrderBy etc. i.e. not just explicit use such as through the nuget package), but given that the architecture of .net doesn't really support that I can't see that happening (but hey, might be worth a suggestion)
Hmmm. And had a bit more of an explore around the code, trying to think how it could, without code bloat, be expanded to many Types. But yeah that seems like an issue. Also I was planning of checking what the AVX instructions for comparison do in the case of floating point NaNs, to check that you can mimic the same behaviour as existing sort. And also I was thinking that as MoveMask is used and only accepts float/double (32/64 bit) then either a different technique would need to be used for smaller Types. Anyway, just a bit of a brain dump here. Just things I was going to explore, but I went to the pub instead :-)
Maybe if I read this in the morning and it doesn't make sense I'll rewrite it!