Bessels.jl icon indicating copy to clipboard operation
Bessels.jl copied to clipboard

Use better sin_sum for F32

Open heltonmc opened this issue 1 year ago • 6 comments

This fixes #90 where performance was fixed in #92.

# before
julia> Bessels.besselj0(328049.34f0)
-0.0013240778f0

# after
julia> Bessels.besselj0(328049.34f0)
-0.0013258625f0

# Float64 number
julia> Bessels.besselj0(Float64(328049.34f0))
-0.001325862383187567

This significantly improves accuracy. The naive version of course is faster..

# Master
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*100 + 20.0))
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min … max):  14.996 ns … 29.645 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     16.055 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.950 ns ±  0.458 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                               ▂      ▂█       
  ▂▂▂▂▂▂▂▂▂▂▂▂▃▄▂▂▂▃▃▂▃▃▄▃▂▁▁▁▁▁▁▁▁▁▁▁▂▂▃▃▃▃▃▃▃█▆▂▂▂▃▆██▆▃▅▇▄ ▃
  15 ns           Histogram: frequency by time        16.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.


# this PR
julia> @benchmark besselj0(x) setup=(x=Float32(rand()*100 + 20.0))
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  18.653 ns … 28.444 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     19.585 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.705 ns ±  0.369 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                           ▆       █       ▁                   
  ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▃▂▂▁▁▁▁▂▅█▂▁▁▁▁▁▇█▃▁▁▂▁▁▂█▄▂▁▁▁▁▁▆▆▂▂▁▁▁▁▄▇ ▃
  18.7 ns         Histogram: frequency by time        20.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

So about 20% slower but performance hit is necessary here as the previous result is inaccurate.

heltonmc avatar Apr 21 '23 20:04 heltonmc