pdqsort
pdqsort copied to clipboard
Support for C++17 parallel execution?
As a drop-in replacement for std::sort, do you have plans to add support for parallel execution in pdqsort?
As I'm sure you know, sort can now do this,
std::sort(std::execution::par, lines.begin(), lines.end());
I don't have plans currently. I would have to do some research on modern standard C++ parallel programming, and there are some tricky things in pdqsort if you want to parallelize. In particular it is assumed the left partition is recursed on first.
I currently don't have the time to do this.
Understood. Thanks for the response.
It may be helpful if you could to outline an approach that someone else could investigate and implement (like you've already pointed out with the left partition assumption). Even a just a few bullet points to get someone started.
As always YMMV, but just to give you an idea of what I'm seeing in my application, here's a simple comparison of the "sort phase" (a timer around std:sort or pdqsort) consisting of 25M text UUIDs,
~38 seconds - std::sort
~17 seconds - std::sort parallel
~25 seconds - pdqsort
This is with g++'s C++17 support and Intel's TBB implementation for parallel std::sort (the only thing that seems to work right now for g++; e.g., g++ --std=c++17 ..... -ltbb).
To be clear, pdqsort is far more CPU efficient because the parallel sort is at 100% CPU utilization on 4 cores for those 17 seconds whereas pdqsort is at 100% on only one core for 25 seconds (and pdqsort beats the single-threaded std::sort handily).
FYI
Have you tried the parallel sort implemented by parasort.h? https://github.com/baserinia/parallel-sort On line 36 of parasort.h, you could try to replace std::sort with pdqsort. Line 36 sorts a subset of elements within a single thread.
You can parallelize pdqsort with poolSTL's pluggable_sort:
poolstl::pluggable_sort(poolstl::par, lines.begin(), lines.end(), pdqsort);
pluggable_sort performs the first few steps of quicksort until there are enough partitions to fill available cores then delegates each partition to the specified sequential method, in this case pdqsort. Works very well.
Sorting 100M integers (M1 chip, 6+2 cores):
6903 ms - std::sort
2313 ms - pdqsort
1323 ms - std::sort(std::execution::par) uses TBB's sort
1278 ms - poolstl::pluggable_sort(std::sort)
697 ms - poolstl::pluggable_sort(pdqsort)
I created parallel quicksort, which is highly inspired by pdqsort: https://github.com/GabTux/PPQSort
ppqsort::sort(ppqsort::execution::par, input.begin(), input.end());
Benchmarks seem promising; see the repository. IPS4o seems the fastest overall but it needs external dependencies (oneTBB). So, the PPQSort, cpp11sort, or poolSTL are great choices for a fast parallel algorithm without external dependencies.