x86-simd-sort issues

Improve argsort for 32-bit

32-bit argsort uses ymm registers: we can switch to zmm registers (use 2x i64gather instructions) and add new bitonic networks.

r-devulap

How does x86-simd-sort compare to vxsort-cpp?

1

Have you tried comparing to: https://github.com/damageboy/vxsort-cpp

nietras

Can I use this lib to output the original index of these sorted data

6

This lib looks awesome! But I have a requirement to output the original index of these sorted data, for example the input array is {300 200 400 99 150 50...

xiangyunzhou

performance on amd 7950x ...

11

Hello, I tried benchmark on 7950x cpu and performance is in some tests up to 2.3x faster but in other tests much slower (like 0.3x) compared to classical sorting. Is...

gregy4

Improve vector FP16 comparison function

I suspect this function https://github.com/intel/x86-simd-sort/blob/7d7591cf5927e83e4a1e7c4b6f2c4dc91a97889f/src/avx512-16bit-qsort.hpp#L65 can be improved with fewer operations. See: https://github.com/numpy/numpy/blob/0bd56e7ec12f8ceeb8d082340e71e60b873d5c57/numpy/core/src/npysort/npysort_common.h#L153 for reference.

r-devulap

Request to Expose partition_unrolled Function for Public Use

4

Dear x86-simd-sort maintainers, I am currently working on a high-performance sorting algorithm to handle billions of uint64_t data entries. To optimize the sorting process, I am leveraging parallel execution and...

game-difficulty