Bessels.jl
Bessels.jl copied to clipboard
Use better sin_sum for F32
This fixes #90 where performance was fixed in #92.
# before
julia> Bessels.besselj0(328049.34f0)
-0.0013240778f0
# after
julia> Bessels.besselj0(328049.34f0)
-0.0013258625f0
# Float64 number
julia> Bessels.besselj0(Float64(328049.34f0))
-0.001325862383187567
This significantly improves accuracy. The naive version of course is faster..
# Master
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*100 + 20.0))
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min … max): 14.996 ns … 29.645 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 16.055 ns ┊ GC (median): 0.00%
Time (mean ± σ): 15.950 ns ± 0.458 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▂█
▂▂▂▂▂▂▂▂▂▂▂▂▃▄▂▂▂▃▃▂▃▃▄▃▂▁▁▁▁▁▁▁▁▁▁▁▂▂▃▃▃▃▃▃▃█▆▂▂▂▃▆██▆▃▅▇▄ ▃
15 ns Histogram: frequency by time 16.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
# this PR
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*100 + 20.0))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
Range (min … max): 18.653 ns … 28.444 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 19.585 ns ┊ GC (median): 0.00%
Time (mean ± σ): 19.705 ns ± 0.369 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▆ █ ▁
▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▃▂▂▁▁▁▁▂▅█▂▁▁▁▁▁▇█▃▁▁▂▁▁▂█▄▂▁▁▁▁▁▆▆▂▂▁▁▁▁▄▇ ▃
18.7 ns Histogram: frequency by time 20.3 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
So about 20% slower but performance hit is necessary here as the previous result is inaccurate.