x86-simd-sort icon indicating copy to clipboard operation
x86-simd-sort copied to clipboard

Cleanup for single vector sort/bitonic merge (and minor cleanup for argsort/argselect)

Open sterrettm2 opened this issue 1 year ago • 0 comments

This patch rewrites all of the single vector sorting and bitonic merging to use swizzle ops and generic masks to reduce code duplication. It also centralizes all of this logic into one file. Also I did a small cleanup to the argsort code

This should have no impact on performance at all, and my testing seems to confirm this:

AVX512 1 million performance
Benchmark                                                              Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------
[simdsort/random_1m vs. simdsort/random_1m]/uint64_t                -0.0000         -0.0000       7758757       7758751       7758474       7758413
[simdsort/random_1m vs. simdsort/random_1m]/int64_t                 +0.0044         +0.0043       7753754       7787564       7753621       7786777
[simdsort/random_1m vs. simdsort/random_1m]/uint32_t                +0.0031         +0.0031       3601040       3612155       3600724       3611744
[simdsort/random_1m vs. simdsort/random_1m]/int32_t                 -0.0007         -0.0007       3615923       3613349       3615524       3613013
[simdsort/random_1m vs. simdsort/random_1m]/uint16_t                +0.0022         +0.0021     121162973     121427061     121160994     121420633
[simdsort/random_1m vs. simdsort/random_1m]/int16_t                 -0.0024         -0.0023     121280015     120989927     121272200     120988256
[simdsort/random_1m vs. simdsort/random_1m]/float                   +0.0039         +0.0038       3482392       3495850       3482070       3495381
[simdsort/random_1m vs. simdsort/random_1m]/double                  -0.0048         -0.0048       6564316       6533029       6564143       6532438
OVERALL_GEOMEAN                                                     +0.0007         +0.0007             0             0             0             0
AVX512 1k performance
Benchmark                                                              Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------
[simdsort/random_1k vs. simdsort/random_1k]/uint64_t                +0.0034         +0.0033          4372          4387          4377          4392
[simdsort/random_1k vs. simdsort/random_1k]/int64_t                 -0.0019         -0.0016          4373          4365          4375          4368
[simdsort/random_1k vs. simdsort/random_1k]/uint32_t                +0.0022         +0.0010          2459          2465          2469          2471
[simdsort/random_1k vs. simdsort/random_1k]/int32_t                 +0.0066         +0.0075          2453          2469          2463          2482
[simdsort/random_1k vs. simdsort/random_1k]/uint16_t                -0.0042         -0.0042         64846         64577         64860         64588
[simdsort/random_1k vs. simdsort/random_1k]/int16_t                 -0.0009         -0.0010         65761         65699         65786         65718
[simdsort/random_1k vs. simdsort/random_1k]/float                   +0.0099         +0.0099          2295          2318          2305          2328
[simdsort/random_1k vs. simdsort/random_1k]/double                  +0.0020         +0.0011          3792          3800          3800          3805
OVERALL_GEOMEAN                                                     +0.0021         +0.0020             0             0             0             0                                                    +0.0014         +0.0015             0             0             0             0
AVX512 16 value performance
Benchmark                                                              Time             CPU      Time Old      Time New       CPU Old       CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------
[simdsort/random_16 vs. simdsort/random_16]/uint64_t                -0.0004         -0.0002           859           859           862           862
[simdsort/random_16 vs. simdsort/random_16]/int64_t                 +0.0008         +0.0007           861           862           864           865
[simdsort/random_16 vs. simdsort/random_16]/uint32_t                +0.0004         +0.0005           850           850           853           853
[simdsort/random_16 vs. simdsort/random_16]/int32_t                 -0.0002         -0.0002           851           851           854           854
[simdsort/random_16 vs. simdsort/random_16]/uint16_t                +0.0000         -0.0000          1124          1124          1126          1126
[simdsort/random_16 vs. simdsort/random_16]/int16_t                 +0.0024         +0.0023          1133          1135          1135          1138
[simdsort/random_16 vs. simdsort/random_16]/float                   +0.0062         +0.0067           868           873           870           876
[simdsort/random_16 vs. simdsort/random_16]/double                  +0.0018         +0.0019           867           868           869           871
OVERALL_GEOMEAN                                                     +0.0014         +0.0015             0             0             0             0

I also simply deduplicated the logic for argsort/argselect, and cleaned up some naming there

Note that currently the test suite does not test the single vector sorting logic for qsort. The key-value versions are effectively tested by the kvsort tests, but nothing ends up testing the non key-value versions.

sterrettm2 avatar May 16 '24 18:05 sterrettm2