x86-simd-sort icon indicating copy to clipboard operation
x86-simd-sort copied to clipboard

Adds support for kv-select, kv-partial sort, and descending order for all key-value functions.

Open sterrettm2 opened this issue 1 year ago • 0 comments

This patch adds support for descending kv-sort and ascending/descending kv-select and kv-partial_sort For reference, some benchmarks comparing to Pytorch's scalar implementation are provided:

With normally distributed float32:

Partial Sort (sorted topk):                                                                           
size           default        avx2           avx512         AVX2 speedup % AVX512 speedup %
16             4.180374384    4.0826931      4.202754736    -2.336663537   0.53536717     
128            5.362578869    6.11398387     5.229052782    14.01200838    -2.489960336   
1024           14.10357333    14.0685565     9.240571122    -0.248283326   -34.48063899   
10000          322.740033     89.83533496    48.29404964    -72.1647996    -85.03623824   
100000         3757.874566    1168.889193    721.571655     -68.89493856   -80.79841032   
1000000        45415.52301    11909.35397    8130.106271    -73.77690891   -82.09839779   
10000000       539107.6565    217536.211     171638.298     -59.64883667   -68.16251894   
100000000      6410786.629    2307972.193    1874841.69     -63.99861161   -70.75488862   
                                                                                          
Select (unsorted topk):                                                                           
size           default        avx2           avx512         AVX2 speedup % AVX512 speedup %                                      
16             4.046328783    4.075162649    4.120069742    0.712593258    1.822416396    
128            4.912573099    5.651566267    4.852572441    15.04289408    -1.221369267   
1024           9.33410556     7.444588822    6.450861102    -20.24314731   -30.88934917   
10000          131.0603339    30.31828351    21.59984892    -76.8669264    -83.5191562    
100000         1273.479035    461.2905271    368.7236242    -63.77714006   -71.04596038   
1000000        12720.43527    4615.330229    6321.014143    -63.71719891   -50.30819302   
10000000       253137.2547    130307.7698    125634.7179    -48.52287945   -50.36893401   
100000000      2394076.347    1226446.152    1257464.17     -48.77163575   -47.47602048   

With uniformly distributed (from 0 to 10e9) int32:

Partial Sort (sorted topk):                                                                           
size           default        avx2           avx512         AVX2 speedup % AVX512 speedup %
16             4.159518957    4.193886042    4.062754393    0.82622738     -2.326340269   
128            4.946233034    6.055155277    5.076540947    22.41953089    2.634487941    
1024           10.11158738    12.40083982    8.094132173    22.63989176    -19.95191391   
10000          235.6421399    81.19698569    47.26006824    -65.54224736   -79.94413552   
100000         2726.637111    1325.142324    692.2358897    -51.40012146   -74.61210049   
1000000        32515.39025    13960.33343    7689.116048    -57.06545939   -76.35237963   
10000000       461327.1713    221366.8823    176379.2038    -52.01520828   -61.76700295   
100000000      5129393.339    2428071.022    1890691.996    -52.66358297   -63.14004658   
                                                                                          
Select (unsorted topk):                                                                           
size           default        avx2           avx512         AVX2 speedup % AVX512 speedup %                             
16             7.28207159     7.344153643    7.334493876    0.852532847    0.719881485    
128            7.539570808    9.854883432    8.218926668    30.70881198    9.010537563    
1024           14.03857871    19.15253601    10.045604      36.427885      -28.4428701    
10000          146.630404     48.96809058    29.91605115    -66.60440861   -79.59764801   
100000         1304.309364    585.7699348    807.1453417    -55.08964737   -38.11703236   
1000000        16298.37354    7968.290179    8426.867279    -51.10990578   -48.29626859   
10000000       276746.4638    175631.3324    161258.0299    -36.5370997    -41.73077129   
100000000      3139316.559    1701539.516    1615918.875    -45.79904624   -48.5264119  

sterrettm2 avatar May 09 '24 17:05 sterrettm2