x86-simd-sort
x86-simd-sort copied to clipboard
C++ template library for high performance SIMD based sorting algorithms
This patch adds support for descending kv-sort and ascending/descending kv-select and kv-partial_sort For reference, some benchmarks comparing to Pytorch's scalar implementation are provided: With normally distributed float32: ``` Partial Sort...
Split different configurations of meson
The error is a problem when building NumPy with baseline cpu feature of avx512f.
Hello, I just want to make sure I'm not doing something wrong. I stumbled upon this project when I found that IPP's radix sort had buffer size calculations that were...
Hi, A build failure on some asm instr: ``` The Meson build system Version: 0.55.1 Source dir: repos/x86-simd-sort Build dir: repos/x86-simd-sort/builddir Build type: native build Project name: x86-simd-sort Project version:...
Hi! Many thanks for your contribution to speeding up sorting in numpy. Wanted to ask if there are any plans to speed up merge/tim sort with AVX2/AVX512?
We only have argsort whose indexes are (unsigned) int64_t. For 32bit types like int/float, I think argsort with int32_t as index type can also be provided, which may performs better...
The original implementation of the partial sorting algorithm was very basic: an initial `avx512_qselect` pass followed by `avx512_qsort`. This works, but unfortunately it can cause unnecessary comparisons and movement of...