Faster than current wrapper function call (including Float32 function call). Uses algorithm based on https://github.com/ARM-software/optimized-routines/blob/master/math/erf.c

Mar 31 '25 15:03 AhmedYKadah

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 94.16%. Comparing base (46a2874) to head (f7470ba). :warning: Report is 27 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #491      +/-   ##
==========================================
+ Coverage   94.02%   94.16%   +0.13%     
==========================================
  Files          14       14              
  Lines        2897     2965      +68     
==========================================
+ Hits         2724     2792      +68     
  Misses        173      173

Flag	Coverage Δ
unittests	`94.16% <100.00%> (+0.13%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Mar 31 '25 16:03 codecov[bot]

Old: Float 64 @benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 217729 samples with 1000 evaluations per sample. Range (min … max): 6.300 ns … 283.700 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 29.500 ns ┊ GC (median): 0.00% Time (mean ± σ): 21.993 ns ± 13.209 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32 @benchmark SpecialFunctions.erf(data) setup=(data=6*rand(Float32)-3) samples=1000000

BenchmarkTools.Trial: 312732 samples with 1000 evaluations per sample. Range (min … max): 4.300 ns … 125.100 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.900 ns ┊ GC (median): 0.00% Time (mean ± σ): 15.035 ns ± 7.951 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

New: Float64 @benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 507504 samples with 1000 evaluations per sample. Range (min … max): 5.400 ns … 4.890 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 8.700 ns ┊ GC (median): 0.00% Time (mean ± σ): 8.775 ns ± 9.855 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32 @benchmark Float32(erf(data)) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 526797 samples with 1000 evaluations per sample. Range (min … max): 5.400 ns … 195.500 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 8.800 ns ┊ GC (median): 0.00% Time (mean ± σ): 8.521 ns ± 2.236 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Mar 31 '25 17:03 AhmedYKadah

Float32 implementation available, but not faster than Float64 version due to a exp() call. Float64 version still faster than old Float32.

Mar 31 '25 17:03 AhmedYKadah

need to clean up polynomial evaluations. code also could use more organization

Apr 05 '25 13:04 AhmedYKadah

Remaining: erfc Float64 and Float32 implementations, and the erf Float32 implementation

Aug 02 '25 08:08 AhmedYKadah

Given that this is faster and accurate, seems good to merge to me!

Sep 15 '25 03:09 oscardssmith

Are there any tests for edge cases/ULP in the c version we do not do ourselves?

Sep 16 '25 06:09 mschauer

Then we should also add a test for these

Sep 16 '25 07:09 mschauer

There aren't any tests for erfc. Is that expected?

Sep 19 '25 09:09 AhmedYKadah

Any other changes needed?

Sep 19 '25 10:09 AhmedYKadah

we should probably should test erfc.

Sep 19 '25 12:09 oscardssmith

Other than missing tests for Inf, looks good to me. @devmotion any further sugestions?

Nov 14 '25 13:11 oscardssmith

What do the benchmark shows with the latest iteration of this PR? In https://github.com/JuliaMath/SpecialFunctions.jl/pull/491#issuecomment-2766902350 performance with Float32 seemed to regress.

Nov 14 '25 20:11 devmotion

Yes, you're correct. The old implementation was not efficient so I redid it.

Old

Float64 @benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 217081 samples with 1000 evaluations per sample. Range (min … max): 6.400 ns … 973.200 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 30.100 ns ┊ GC (median): 0.00% Time (mean ± σ): 22.028 ns ± 13.612 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32 @benchmark erf(data) setup=(data=6*rand(Float32)-3) samples=1000000

BenchmarkTools.Trial: 308269 samples with 1000 evaluations per sample. Range (min … max): 4.300 ns … 160.400 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 19.900 ns ┊ GC (median): 0.00% Time (mean ± σ): 15.207 ns ± 8.609 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

New

Float64 @benchmark erf(data) setup=(data=6*rand(Float64)-3) samples=1000000

BenchmarkTools.Trial: 604748 samples with 1000 evaluations per sample. Range (min … max): 4.700 ns … 164.400 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 7.400 ns ┊ GC (median): 0.00% Time (mean ± σ): 7.302 ns ± 3.017 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Float32 @benchmark erf(data) setup=(data=6*rand(Float32)-3) samples=1000000

BenchmarkTools.Trial: 641285 samples with 1000 evaluations per sample. Range (min … max): 4.200 ns … 204.800 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 6.900 ns ┊ GC (median): 0.00% Time (mean ± σ): 6.840 ns ± 2.961 ns ┊ GC (mean ± σ): 0.00% ± 0.00%

Nov 14 '25 21:11 AhmedYKadah

Anything left for me to do?

Nov 15 '25 09:11 AhmedYKadah

yeah, can you bump the patch number (in Project.toml) so that when we merge the PR, we can tag a new version?

Nov 15 '25 15:11 oscardssmith

Thanks @AhmedYKadah for the Pr and @devmotion for the additional review!

Nov 15 '25 21:11 oscardssmith

It's been a pleasure 😄

Nov 15 '25 21:11 AhmedYKadah

Added erf(x) Float64 Julia implementation

Codecov Report

Old

New