Added erf(x) Float64 Julia implementation
Faster than current wrapper function call (including Float32 function call). Uses algorithm based on https://github.com/ARM-software/optimized-routines/blob/master/math/erf.c
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 94.16%. Comparing base (46a2874) to head (f7470ba).
:warning: Report is 27 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #491 +/- ##
==========================================
+ Coverage 94.02% 94.16% +0.13%
==========================================
Files 14 14
Lines 2897 2965 +68
==========================================
+ Hits 2724 2792 +68
Misses 173 173
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 94.16% <100.00%> (+0.13%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Old:
Float 64
@benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float64)-3) samples=1000000
BenchmarkTools.Trial: 217729 samples with 1000 evaluations per sample. Range (min … max): 6.300 ns … 283.700 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 29.500 ns ┊ GC (median): 0.00% Time (mean ± σ): 21.993 ns ± 13.209 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
Float32
@benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float32)-3) samples=1000000
BenchmarkTools.Trial: 312732 samples with 1000 evaluations per sample. Range (min … max): 4.300 ns … 125.100 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.900 ns ┊ GC (median): 0.00% Time (mean ± σ): 15.035 ns ± 7.951 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
New:
Float64
@benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000
BenchmarkTools.Trial: 507504 samples with 1000 evaluations per sample. Range (min … max): 5.400 ns … 4.890 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 8.700 ns ┊ GC (median): 0.00% Time (mean ± σ): 8.775 ns ± 9.855 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
Float32
@benchmark Float32(erf(data)) setup=(data=6*rand(Float64)-3) samples=1000000
BenchmarkTools.Trial: 526797 samples with 1000 evaluations per sample. Range (min … max): 5.400 ns … 195.500 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 8.800 ns ┊ GC (median): 0.00% Time (mean ± σ): 8.521 ns ± 2.236 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
Float32 implementation available, but not faster than Float64 version due to a exp() call. Float64 version still faster than old Float32.
need to clean up polynomial evaluations. code also could use more organization
Remaining: erfc Float64 and Float32 implementations, and the erf Float32 implementation
Given that this is faster and accurate, seems good to merge to me!
Are there any tests for edge cases/ULP in the c version we do not do ourselves?
Then we should also add a test for these
There aren't any tests for erfc. Is that expected?
Any other changes needed?
we should probably should test erfc.
Other than missing tests for Inf, looks good to me. @devmotion any further sugestions?
What do the benchmark shows with the latest iteration of this PR? In https://github.com/JuliaMath/SpecialFunctions.jl/pull/491#issuecomment-2766902350 performance with Float32 seemed to regress.
Yes, you're correct. The old implementation was not efficient so I redid it.
Old
Float64
@benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000
BenchmarkTools.Trial: 217081 samples with 1000 evaluations per sample. Range (min … max): 6.400 ns … 973.200 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 30.100 ns ┊ GC (median): 0.00% Time (mean ± σ): 22.028 ns ± 13.612 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
Float32
@benchmark erf(data) setup=(data=6*rand(Float32)-3) samples=1000000
BenchmarkTools.Trial: 308269 samples with 1000 evaluations per sample. Range (min … max): 4.300 ns … 160.400 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.900 ns ┊ GC (median): 0.00% Time (mean ± σ): 15.207 ns ± 8.609 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
New
Float64
@benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000
BenchmarkTools.Trial: 604748 samples with 1000 evaluations per sample. Range (min … max): 4.700 ns … 164.400 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 7.400 ns ┊ GC (median): 0.00% Time (mean ± σ): 7.302 ns ± 3.017 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
Float32
@benchmark erf(data) setup=(data=6*rand(Float32)-3) samples=1000000
BenchmarkTools.Trial: 641285 samples with 1000 evaluations per sample. Range (min … max): 4.200 ns … 204.800 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 6.900 ns ┊ GC (median): 0.00% Time (mean ± σ): 6.840 ns ± 2.961 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
Anything left for me to do?
yeah, can you bump the patch number (in Project.toml) so that when we merge the PR, we can tag a new version?
Thanks @AhmedYKadah for the Pr and @devmotion for the additional review!
It's been a pleasure 😄