SLEEF.jl
SLEEF.jl copied to clipboard
Added support for SIMD.jl; WIP
- [ ] Add tests for Vec{N,T} where T <: FloatTypes.
- [ ] Make sure all of these tests also pass.
- [ ] Investigate performance regressions vs the SLEEF C library.
Overview of this PR:
The C SLEEF (SIMD Library for Evaluating Elementary Functions) library provides vectorized elementary functions. Therefore, I thought it makes sense to let SLEEF.jl support the SIMD.jl's Vec{N,T} vector type.
This PR provides preliminary support.
using SIMD, SLEEF, SLEEFwrap, BenchmarkTools, Random
@inline extract(x) = x.elts # 64-byte vectors segfault when returned while wrapped in a struct
sv8 = Vec{8,Float32}(ntuple(Val(8)) do x Core.VecElement(randexp(Float32)) end)
dv4 = Vec{4,Float64}(ntuple(Val(4)) do x Core.VecElement(randexp(Float64)) end)
sv16 = Vec{16,Float32}(ntuple(Val(16)) do x Core.VecElement(randexp(Float32)) end)
dv8 = Vec{8,Float64}(ntuple(Val(8)) do x Core.VecElement(randexp(Float64)) end)
function bench(jl, c, x)
display(@benchmark extract($jl($x)))
display(@benchmark $c(extract($x)))
end
Testing a bunch of functions:
exp:
julia> bench(SLEEF.exp, SLEEFwrap.exp, sv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 5.545 ns (0.00% GC)
median time: 5.686 ns (0.00% GC)
mean time: 5.816 ns (0.00% GC)
maximum time: 23.974 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 4.689 ns (0.00% GC)
median time: 4.722 ns (0.00% GC)
mean time: 4.740 ns (0.00% GC)
maximum time: 23.272 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> bench(SLEEF.exp, SLEEFwrap.exp, dv4)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 7.408 ns (0.00% GC)
median time: 7.449 ns (0.00% GC)
mean time: 7.467 ns (0.00% GC)
maximum time: 24.513 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 6.615 ns (0.00% GC)
median time: 6.722 ns (0.00% GC)
mean time: 6.737 ns (0.00% GC)
maximum time: 20.488 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> bench(SLEEF.exp, SLEEFwrap.exp, sv16)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4677168108795565027
--------------
minimum time: 5.691 ns (0.00% GC)
median time: 5.731 ns (0.00% GC)
mean time: 5.779 ns (0.00% GC)
maximum time: 22.034 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4677168108795565027
--------------
minimum time: 5.256 ns (0.00% GC)
median time: 5.287 ns (0.00% GC)
mean time: 5.297 ns (0.00% GC)
maximum time: 14.432 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 1000
julia> bench(SLEEF.exp, SLEEFwrap.exp, dv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4613273474792594525
--------------
minimum time: 7.284 ns (0.00% GC)
median time: 7.321 ns (0.00% GC)
mean time: 7.336 ns (0.00% GC)
maximum time: 25.833 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4613273474792594525
--------------
minimum time: 11.036 ns (0.00% GC)
median time: 11.553 ns (0.00% GC)
mean time: 11.370 ns (0.00% GC)
maximum time: 38.117 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
log
julia> bench(SLEEF.log, SLEEFwrap.log, sv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 15.225 ns (0.00% GC)
median time: 15.276 ns (0.00% GC)
mean time: 15.310 ns (0.00% GC)
maximum time: 31.264 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 9.967 ns (0.00% GC)
median time: 10.042 ns (0.00% GC)
mean time: 10.065 ns (0.00% GC)
maximum time: 32.280 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
julia> bench(SLEEF.log, SLEEFwrap.log, dv4)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 16.762 ns (0.00% GC)
median time: 16.993 ns (0.00% GC)
mean time: 16.964 ns (0.00% GC)
maximum time: 30.792 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 12.829 ns (0.00% GC)
median time: 12.873 ns (0.00% GC)
mean time: 12.897 ns (0.00% GC)
maximum time: 27.613 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
julia> bench(SLEEF.log, SLEEFwrap.log, sv16)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4552958378306737260
--------------
minimum time: 16.331 ns (0.00% GC)
median time: 16.536 ns (0.00% GC)
mean time: 16.543 ns (0.00% GC)
maximum time: 42.043 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4552958378306737260
--------------
minimum time: 8.060 ns (0.00% GC)
median time: 8.115 ns (0.00% GC)
mean time: 8.130 ns (0.00% GC)
maximum time: 31.205 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
julia> bench(SLEEF.log, SLEEFwrap.log, dv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: -4651049139759164439
--------------
minimum time: 18.395 ns (0.00% GC)
median time: 18.477 ns (0.00% GC)
mean time: 18.613 ns (0.00% GC)
maximum time: 45.013 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: -4651049139759164439
--------------
minimum time: 11.021 ns (0.00% GC)
median time: 11.084 ns (0.00% GC)
mean time: 11.114 ns (0.00% GC)
maximum time: 35.427 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
sin
julia> bench(SLEEF.sin, SLEEFwrap.sin, sv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 19.354 ns (0.00% GC)
median time: 19.471 ns (0.00% GC)
mean time: 19.612 ns (0.00% GC)
maximum time: 37.226 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 9.906 ns (0.00% GC)
median time: 9.953 ns (0.00% GC)
mean time: 9.972 ns (0.00% GC)
maximum time: 21.988 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
julia> bench(SLEEF.sin, SLEEFwrap.sin, dv4)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 28.163 ns (0.00% GC)
median time: 28.265 ns (0.00% GC)
mean time: 28.329 ns (0.00% GC)
maximum time: 52.633 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 995
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 10.484 ns (0.00% GC)
median time: 10.541 ns (0.00% GC)
mean time: 10.568 ns (0.00% GC)
maximum time: 27.162 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
julia> bench(SLEEF.sin, SLEEFwrap.sin, sv16)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4569599948461514222
--------------
minimum time: 20.364 ns (0.00% GC)
median time: 20.458 ns (0.00% GC)
mean time: 20.502 ns (0.00% GC)
maximum time: 47.938 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4569599948461514222
--------------
minimum time: 10.426 ns (0.00% GC)
median time: 10.565 ns (0.00% GC)
mean time: 10.587 ns (0.00% GC)
maximum time: 33.371 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
julia> bench(SLEEF.sin, SLEEFwrap.sin, dv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4605730538145129761
--------------
minimum time: 28.796 ns (0.00% GC)
median time: 28.919 ns (0.00% GC)
mean time: 29.123 ns (0.00% GC)
maximum time: 55.898 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 995
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4605730538145129761
--------------
minimum time: 11.913 ns (0.00% GC)
median time: 12.026 ns (0.00% GC)
mean time: 12.050 ns (0.00% GC)
maximum time: 33.233 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
tan
julia> bench(SLEEF.tan, SLEEFwrap.tan, sv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 36.797 ns (0.00% GC)
median time: 36.895 ns (0.00% GC)
mean time: 36.988 ns (0.00% GC)
maximum time: 58.675 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 992
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 16.273 ns (0.00% GC)
median time: 16.346 ns (0.00% GC)
mean time: 16.381 ns (0.00% GC)
maximum time: 34.868 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
julia> bench(SLEEF.tan, SLEEFwrap.tan, dv4)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 51.512 ns (0.00% GC)
median time: 51.640 ns (0.00% GC)
mean time: 52.010 ns (0.00% GC)
maximum time: 73.956 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 986
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 14.053 ns (0.00% GC)
median time: 14.161 ns (0.00% GC)
mean time: 14.179 ns (0.00% GC)
maximum time: 31.734 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
julia> bench(SLEEF.tan, SLEEFwrap.tan, sv16)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: -4606600161933539213
--------------
minimum time: 38.064 ns (0.00% GC)
median time: 38.202 ns (0.00% GC)
mean time: 38.285 ns (0.00% GC)
maximum time: 62.710 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 992
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: -4606600161933539213
--------------
minimum time: 18.630 ns (0.00% GC)
median time: 18.712 ns (0.00% GC)
mean time: 18.756 ns (0.00% GC)
maximum time: 44.121 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 997
julia> bench(SLEEF.tan, SLEEFwrap.tan, dv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4609617611958208877
--------------
minimum time: 55.713 ns (0.00% GC)
median time: 55.881 ns (0.00% GC)
mean time: 56.035 ns (0.00% GC)
maximum time: 78.817 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 984
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4609617611958208877
--------------
minimum time: 17.800 ns (0.00% GC)
median time: 17.898 ns (0.00% GC)
mean time: 18.053 ns (0.00% GC)
maximum time: 42.916 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 998
cbrt
julia> bench(SLEEF.cbrt, SLEEFwrap.cbrt, sv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 31.845 ns (0.00% GC)
median time: 32.018 ns (0.00% GC)
mean time: 32.143 ns (0.00% GC)
maximum time: 54.500 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 994
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 25.222 ns (0.00% GC)
median time: 26.324 ns (0.00% GC)
mean time: 26.364 ns (0.00% GC)
maximum time: 43.927 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 996
julia> bench(SLEEF.cbrt, SLEEFwrap.cbrt, dv4)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 36.175 ns (0.00% GC)
median time: 36.303 ns (0.00% GC)
mean time: 36.564 ns (0.00% GC)
maximum time: 57.701 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 993
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 28.349 ns (0.00% GC)
median time: 29.205 ns (0.00% GC)
mean time: 29.250 ns (0.00% GC)
maximum time: 46.513 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 995
julia> bench(SLEEF.cbrt, SLEEFwrap.cbrt, sv16)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4584898609104978811
--------------
minimum time: 34.463 ns (0.00% GC)
median time: 34.570 ns (0.00% GC)
mean time: 34.634 ns (0.00% GC)
maximum time: 58.556 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 993
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4584898609104978811
--------------
minimum time: 23.273 ns (0.00% GC)
median time: 25.731 ns (0.00% GC)
mean time: 25.492 ns (0.00% GC)
maximum time: 50.618 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 996
julia> bench(SLEEF.cbrt, SLEEFwrap.cbrt, dv8)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4607167657796590655
--------------
minimum time: 42.291 ns (0.00% GC)
median time: 42.392 ns (0.00% GC)
mean time: 42.476 ns (0.00% GC)
maximum time: 65.205 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 990
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 4607167657796590655
--------------
minimum time: 26.524 ns (0.00% GC)
median time: 26.741 ns (0.00% GC)
mean time: 26.800 ns (0.00% GC)
maximum time: 46.431 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 995
Performance is currently often 2 or 3x worse than SLEEFwrap.jl (which wraps the C library).
Coverage increased (+36.7%) to 65.182% when pulling 8b83a5ac5c51cd3da68625f0de93640d29783b9e on chriselrod:master into b089af504632f29b694c119f9d4fbbfe0441547b on musm:master.
Coverage increased (+36.6%) to 65.074% when pulling e57ed3c1891a5438078974a4f8fc01936a31200c on chriselrod:master into b089af504632f29b694c119f9d4fbbfe0441547b on musm:master.
awesome progress. Can you please remove the Manifest file