SLEEF.jl icon indicating copy to clipboard operation
SLEEF.jl copied to clipboard

Added support for SIMD.jl; WIP

Open chriselrod opened this issue 6 years ago • 3 comments

  • [ ] Add tests for Vec{N,T} where T <: FloatTypes.
  • [ ] Make sure all of these tests also pass.
  • [ ] Investigate performance regressions vs the SLEEF C library.

Overview of this PR: The C SLEEF (SIMD Library for Evaluating Elementary Functions) library provides vectorized elementary functions. Therefore, I thought it makes sense to let SLEEF.jl support the SIMD.jl's Vec{N,T} vector type.

This PR provides preliminary support.

using SIMD, SLEEF, SLEEFwrap, BenchmarkTools, Random
@inline extract(x) = x.elts # 64-byte vectors segfault when returned while wrapped in a struct
sv8 = Vec{8,Float32}(ntuple(Val(8)) do x Core.VecElement(randexp(Float32)) end)
dv4 = Vec{4,Float64}(ntuple(Val(4)) do x Core.VecElement(randexp(Float64)) end)
sv16 = Vec{16,Float32}(ntuple(Val(16)) do x Core.VecElement(randexp(Float32)) end)
dv8 = Vec{8,Float64}(ntuple(Val(8)) do x Core.VecElement(randexp(Float64)) end)
function bench(jl, c, x)
    display(@benchmark extract($jl($x)))
    display(@benchmark $c(extract($x)))
end

Testing a bunch of functions: exp:

julia> bench(SLEEF.exp, SLEEFwrap.exp, sv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     5.545 ns (0.00% GC)
  median time:      5.686 ns (0.00% GC)
  mean time:        5.816 ns (0.00% GC)
  maximum time:     23.974 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.689 ns (0.00% GC)
  median time:      4.722 ns (0.00% GC)
  mean time:        4.740 ns (0.00% GC)
  maximum time:     23.272 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> bench(SLEEF.exp, SLEEFwrap.exp, dv4)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     7.408 ns (0.00% GC)
  median time:      7.449 ns (0.00% GC)
  mean time:        7.467 ns (0.00% GC)
  maximum time:     24.513 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     6.615 ns (0.00% GC)
  median time:      6.722 ns (0.00% GC)
  mean time:        6.737 ns (0.00% GC)
  maximum time:     20.488 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> bench(SLEEF.exp, SLEEFwrap.exp, sv16)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4677168108795565027
  --------------
  minimum time:     5.691 ns (0.00% GC)
  median time:      5.731 ns (0.00% GC)
  mean time:        5.779 ns (0.00% GC)
  maximum time:     22.034 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4677168108795565027
  --------------
  minimum time:     5.256 ns (0.00% GC)
  median time:      5.287 ns (0.00% GC)
  mean time:        5.297 ns (0.00% GC)
  maximum time:     14.432 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

julia> bench(SLEEF.exp, SLEEFwrap.exp, dv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4613273474792594525
  --------------
  minimum time:     7.284 ns (0.00% GC)
  median time:      7.321 ns (0.00% GC)
  mean time:        7.336 ns (0.00% GC)
  maximum time:     25.833 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4613273474792594525
  --------------
  minimum time:     11.036 ns (0.00% GC)
  median time:      11.553 ns (0.00% GC)
  mean time:        11.370 ns (0.00% GC)
  maximum time:     38.117 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

log

julia> bench(SLEEF.log, SLEEFwrap.log, sv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     15.225 ns (0.00% GC)
  median time:      15.276 ns (0.00% GC)
  mean time:        15.310 ns (0.00% GC)
  maximum time:     31.264 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.967 ns (0.00% GC)
  median time:      10.042 ns (0.00% GC)
  mean time:        10.065 ns (0.00% GC)
  maximum time:     32.280 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> bench(SLEEF.log, SLEEFwrap.log, dv4)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     16.762 ns (0.00% GC)
  median time:      16.993 ns (0.00% GC)
  mean time:        16.964 ns (0.00% GC)
  maximum time:     30.792 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     12.829 ns (0.00% GC)
  median time:      12.873 ns (0.00% GC)
  mean time:        12.897 ns (0.00% GC)
  maximum time:     27.613 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> bench(SLEEF.log, SLEEFwrap.log, sv16)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4552958378306737260
  --------------
  minimum time:     16.331 ns (0.00% GC)
  median time:      16.536 ns (0.00% GC)
  mean time:        16.543 ns (0.00% GC)
  maximum time:     42.043 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4552958378306737260
  --------------
  minimum time:     8.060 ns (0.00% GC)
  median time:      8.115 ns (0.00% GC)
  mean time:        8.130 ns (0.00% GC)
  maximum time:     31.205 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> bench(SLEEF.log, SLEEFwrap.log, dv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  -4651049139759164439
  --------------
  minimum time:     18.395 ns (0.00% GC)
  median time:      18.477 ns (0.00% GC)
  mean time:        18.613 ns (0.00% GC)
  maximum time:     45.013 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  -4651049139759164439
  --------------
  minimum time:     11.021 ns (0.00% GC)
  median time:      11.084 ns (0.00% GC)
  mean time:        11.114 ns (0.00% GC)
  maximum time:     35.427 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

sin

julia> bench(SLEEF.sin, SLEEFwrap.sin, sv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     19.354 ns (0.00% GC)
  median time:      19.471 ns (0.00% GC)
  mean time:        19.612 ns (0.00% GC)
  maximum time:     37.226 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     9.906 ns (0.00% GC)
  median time:      9.953 ns (0.00% GC)
  mean time:        9.972 ns (0.00% GC)
  maximum time:     21.988 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> bench(SLEEF.sin, SLEEFwrap.sin, dv4)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     28.163 ns (0.00% GC)
  median time:      28.265 ns (0.00% GC)
  mean time:        28.329 ns (0.00% GC)
  maximum time:     52.633 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     10.484 ns (0.00% GC)
  median time:      10.541 ns (0.00% GC)
  mean time:        10.568 ns (0.00% GC)
  maximum time:     27.162 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> bench(SLEEF.sin, SLEEFwrap.sin, sv16)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4569599948461514222
  --------------
  minimum time:     20.364 ns (0.00% GC)
  median time:      20.458 ns (0.00% GC)
  mean time:        20.502 ns (0.00% GC)
  maximum time:     47.938 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4569599948461514222
  --------------
  minimum time:     10.426 ns (0.00% GC)
  median time:      10.565 ns (0.00% GC)
  mean time:        10.587 ns (0.00% GC)
  maximum time:     33.371 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

julia> bench(SLEEF.sin, SLEEFwrap.sin, dv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4605730538145129761
  --------------
  minimum time:     28.796 ns (0.00% GC)
  median time:      28.919 ns (0.00% GC)
  mean time:        29.123 ns (0.00% GC)
  maximum time:     55.898 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4605730538145129761
  --------------
  minimum time:     11.913 ns (0.00% GC)
  median time:      12.026 ns (0.00% GC)
  mean time:        12.050 ns (0.00% GC)
  maximum time:     33.233 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

tan

julia> bench(SLEEF.tan, SLEEFwrap.tan, sv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     36.797 ns (0.00% GC)
  median time:      36.895 ns (0.00% GC)
  mean time:        36.988 ns (0.00% GC)
  maximum time:     58.675 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     992
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     16.273 ns (0.00% GC)
  median time:      16.346 ns (0.00% GC)
  mean time:        16.381 ns (0.00% GC)
  maximum time:     34.868 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> bench(SLEEF.tan, SLEEFwrap.tan, dv4)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     51.512 ns (0.00% GC)
  median time:      51.640 ns (0.00% GC)
  mean time:        52.010 ns (0.00% GC)
  maximum time:     73.956 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     986
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     14.053 ns (0.00% GC)
  median time:      14.161 ns (0.00% GC)
  mean time:        14.179 ns (0.00% GC)
  maximum time:     31.734 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

julia> bench(SLEEF.tan, SLEEFwrap.tan, sv16)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  -4606600161933539213
  --------------
  minimum time:     38.064 ns (0.00% GC)
  median time:      38.202 ns (0.00% GC)
  mean time:        38.285 ns (0.00% GC)
  maximum time:     62.710 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     992
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  -4606600161933539213
  --------------
  minimum time:     18.630 ns (0.00% GC)
  median time:      18.712 ns (0.00% GC)
  mean time:        18.756 ns (0.00% GC)
  maximum time:     44.121 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> bench(SLEEF.tan, SLEEFwrap.tan, dv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4609617611958208877
  --------------
  minimum time:     55.713 ns (0.00% GC)
  median time:      55.881 ns (0.00% GC)
  mean time:        56.035 ns (0.00% GC)
  maximum time:     78.817 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     984
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4609617611958208877
  --------------
  minimum time:     17.800 ns (0.00% GC)
  median time:      17.898 ns (0.00% GC)
  mean time:        18.053 ns (0.00% GC)
  maximum time:     42.916 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     998

cbrt

julia> bench(SLEEF.cbrt, SLEEFwrap.cbrt, sv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     31.845 ns (0.00% GC)
  median time:      32.018 ns (0.00% GC)
  mean time:        32.143 ns (0.00% GC)
  maximum time:     54.500 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     994
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     25.222 ns (0.00% GC)
  median time:      26.324 ns (0.00% GC)
  mean time:        26.364 ns (0.00% GC)
  maximum time:     43.927 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     996

julia> bench(SLEEF.cbrt, SLEEFwrap.cbrt, dv4)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     36.175 ns (0.00% GC)
  median time:      36.303 ns (0.00% GC)
  mean time:        36.564 ns (0.00% GC)
  maximum time:     57.701 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     993
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     28.349 ns (0.00% GC)
  median time:      29.205 ns (0.00% GC)
  mean time:        29.250 ns (0.00% GC)
  maximum time:     46.513 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

julia> bench(SLEEF.cbrt, SLEEFwrap.cbrt, sv16)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4584898609104978811
  --------------
  minimum time:     34.463 ns (0.00% GC)
  median time:      34.570 ns (0.00% GC)
  mean time:        34.634 ns (0.00% GC)
  maximum time:     58.556 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     993
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4584898609104978811
  --------------
  minimum time:     23.273 ns (0.00% GC)
  median time:      25.731 ns (0.00% GC)
  mean time:        25.492 ns (0.00% GC)
  maximum time:     50.618 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     996

julia> bench(SLEEF.cbrt, SLEEFwrap.cbrt, dv8)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4607167657796590655
  --------------
  minimum time:     42.291 ns (0.00% GC)
  median time:      42.392 ns (0.00% GC)
  mean time:        42.476 ns (0.00% GC)
  maximum time:     65.205 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     990
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  4607167657796590655
  --------------
  minimum time:     26.524 ns (0.00% GC)
  median time:      26.741 ns (0.00% GC)
  mean time:        26.800 ns (0.00% GC)
  maximum time:     46.431 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

Performance is currently often 2 or 3x worse than SLEEFwrap.jl (which wraps the C library).

chriselrod avatar Dec 25 '18 08:12 chriselrod

Coverage Status

Coverage increased (+36.7%) to 65.182% when pulling 8b83a5ac5c51cd3da68625f0de93640d29783b9e on chriselrod:master into b089af504632f29b694c119f9d4fbbfe0441547b on musm:master.

coveralls avatar Dec 25 '18 09:12 coveralls

Coverage Status

Coverage increased (+36.6%) to 65.074% when pulling e57ed3c1891a5438078974a4f8fc01936a31200c on chriselrod:master into b089af504632f29b694c119f9d4fbbfe0441547b on musm:master.

coveralls avatar Dec 25 '18 09:12 coveralls

awesome progress. Can you please remove the Manifest file

musm avatar Dec 25 '18 17:12 musm