x86-simd-sort icon indicating copy to clipboard operation
x86-simd-sort copied to clipboard

Merge sort and AVX512

Open sergii-mamedov opened this issue 2 years ago • 3 comments

Hi!

Many thanks for your contribution to speeding up sorting in numpy. Wanted to ask if there are any plans to speed up merge/tim sort with AVX2/AVX512?

sergii-mamedov avatar Nov 05 '23 16:11 sergii-mamedov

Hi @sergii-mamedov that isn't in my to do list (at least not yet) unless there is a good enough justification to work on those. Is there a reason why merge/tim sort will be preferred over quicksort?

r-devulap avatar Nov 09 '23 21:11 r-devulap

Hi @r-devulap I was supported the METASPACE project. Sorting is one of the most time-consuming steps. We have from a hundred to ten thousand already sorted arrays (m/z spectra), which we need to combine into one array and sort. The total size of arrays ranges from hundreds of megabytes to hundreds of gigabytes. I tested quicksort vs mergesort (in numpy terminology) two years ago - mergesort was 5-10 times faster. Last weekend I tested the latest release of numpy - mergesort is still about twice as fast on my data. Also, a feature of merge/tim sort is that it is a stable variant of sorting, that is, the order of elements is determined. This is important for this task.

sergii-mamedov avatar Nov 09 '23 22:11 sergii-mamedov

curious as to what data type and distribution you have in your application. On NumPy benchmarks, quicksort is slower than merge sort only for ordered and reverse arrays:

             quick   float64        ('random',)         37.9±0.02μs
              quick   float64        ('ordered',)         37.3±0.4μs
              quick   float64       ('reversed',)         37.8±0.2μs
              quick   float64        ('uniform',)        5.44±0.04μs
              quick   float64    ('sorted_block', 10)    36.2±0.08μs
              quick   float64   ('sorted_block', 100)    39.7±0.05μs
              quick   float64   ('sorted_block', 1000)    43.5±0.1μs
              merge   float64        ('random',)          609±0.7μs
              merge   float64        ('ordered',)        18.3±0.03μs
              merge   float64       ('reversed',)        12.7±0.04μs
              merge   float64        ('uniform',)        18.3±0.03μs
              merge   float64    ('sorted_block', 10)      169±1μs
              merge   float64   ('sorted_block', 100)     123±0.8μs
              merge   float64   ('sorted_block', 1000)   68.2±0.06μs

r-devulap avatar Nov 09 '23 23:11 r-devulap