pdqsort icon indicating copy to clipboard operation
pdqsort copied to clipboard

Support for C++17 parallel execution?

Open charles-randall opened this issue 4 years ago • 5 comments

As a drop-in replacement for std::sort, do you have plans to add support for parallel execution in pdqsort?

As I'm sure you know, sort can now do this,

std::sort(std::execution::par, lines.begin(), lines.end());

charles-randall avatar Apr 18 '21 01:04 charles-randall

I don't have plans currently. I would have to do some research on modern standard C++ parallel programming, and there are some tricky things in pdqsort if you want to parallelize. In particular it is assumed the left partition is recursed on first.

I currently don't have the time to do this.

orlp avatar Apr 20 '21 09:04 orlp

Understood. Thanks for the response.

It may be helpful if you could to outline an approach that someone else could investigate and implement (like you've already pointed out with the left partition assumption). Even a just a few bullet points to get someone started.

As always YMMV, but just to give you an idea of what I'm seeing in my application, here's a simple comparison of the "sort phase" (a timer around std:sort or pdqsort) consisting of 25M text UUIDs,

~38 seconds - std::sort
~17 seconds - std::sort parallel 
~25 seconds - pdqsort      

This is with g++'s C++17 support and Intel's TBB implementation for parallel std::sort (the only thing that seems to work right now for g++; e.g., g++ --std=c++17 ..... -ltbb).

To be clear, pdqsort is far more CPU efficient because the parallel sort is at 100% CPU utilization on 4 cores for those 17 seconds whereas pdqsort is at 100% on only one core for 25 seconds (and pdqsort beats the single-threaded std::sort handily).

FYI

charles-randall avatar Apr 20 '21 14:04 charles-randall

Have you tried the parallel sort implemented by parasort.h? https://github.com/baserinia/parallel-sort On line 36 of parasort.h, you could try to replace std::sort with pdqsort. Line 36 sorts a subset of elements within a single thread.

stumarcus314 avatar Nov 03 '22 02:11 stumarcus314

You can parallelize pdqsort with poolSTL's pluggable_sort:

poolstl::pluggable_sort(poolstl::par, lines.begin(), lines.end(), pdqsort);

pluggable_sort performs the first few steps of quicksort until there are enough partitions to fill available cores then delegates each partition to the specified sequential method, in this case pdqsort. Works very well.

Sorting 100M integers (M1 chip, 6+2 cores):

6903 ms - std::sort
2313 ms - pdqsort
1323 ms - std::sort(std::execution::par) uses TBB's sort
1278 ms - poolstl::pluggable_sort(std::sort)
 697 ms - poolstl::pluggable_sort(pdqsort)

alugowski avatar Jan 25 '24 06:01 alugowski

I created parallel quicksort, which is highly inspired by pdqsort: https://github.com/GabTux/PPQSort

ppqsort::sort(ppqsort::execution::par, input.begin(), input.end());

Benchmarks seem promising; see the repository. IPS4o seems the fastest overall but it needs external dependencies (oneTBB). So, the PPQSort, cpp11sort, or poolSTL are great choices for a fast parallel algorithm without external dependencies.

GabTux avatar Apr 13 '24 13:04 GabTux