Combinatorics.jl icon indicating copy to clipboard operation
Combinatorics.jl copied to clipboard

Improve performance of `derangement`/`subfactorial` with iterative implementation

Open FedericoStra opened this issue 1 year ago • 3 comments

This PR changes the implementation of derangement, hence also subfactorial, to use the recursive formula !n = (n-1) * (!(n-1) + !(n-2)) presented here.

For values such as subfactorial(100) I get a huge speed-up, like ~10x.

FedericoStra avatar Dec 10 '23 19:12 FedericoStra

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (e6888be) 96.97% compared to head (afc071d) 96.87%.

Files Patch % Lines
src/factorials.jl 91.66% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
- Coverage   96.97%   96.87%   -0.10%     
==========================================
  Files           7        7              
  Lines         728      737       +9     
==========================================
+ Hits          706      714       +8     
- Misses         22       23       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Dec 10 '23 19:12 codecov[bot]

With this change I pass from

julia> @benchmark subfactorial(100)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  185.527 μs … 47.124 ms  ┊ GC (min … max):  0.00% … 60.54%
 Time  (median):     189.635 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   267.191 μs ±  1.711 ms  ┊ GC (mean ± σ):  14.89% ±  2.31%

  ▆█▇▆▆▅▄▃▂▃▃▃▃▂▁▁▁▂▁▁▁                                        ▂
  █████████████████████▇▇▇▅▆▇▅▄▆▆█▇█▇▇▇▇▇▆▆▅▅▃▅▄▄▃▄▄▃▁▄▁▁▄▃▃▁▃ █
  186 μs        Histogram: log(frequency) by time       273 μs <

 Memory estimate: 81.06 KiB, allocs estimate: 2931.

to

julia> @benchmark subfactorial(100)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  12.113 μs …  49.734 ms  ┊ GC (min … max):  0.00% … 60.39%
 Time  (median):     15.437 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   29.177 μs ± 762.966 μs  ┊ GC (mean ± σ):  25.19% ±  0.96%

          ▂▅▆█▂  ▂▄     ▄▅▁
  ▁▁▁▁▁▁▂▄█████▆▅███▅▅▃▅███▆▅▅▄▃▃▂▂▁▁▁▁▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  12.1 μs         Histogram: frequency by time         24.2 μs <

 Memory estimate: 14.09 KiB, allocs estimate: 499.

FedericoStra avatar Dec 11 '23 17:12 FedericoStra

Changes

  • I replaced the recursive formula with the simpler !n = n * !(n-1) + (-1)^n.
  • I replaced all operations on BigInts with in-place functions from Base.GMP.MPZ to reduce allocations.

Consequences The performance improved by another factor ~10x:

julia> @benchmark subfactorial(100)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.427 μs … 137.132 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.480 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.538 μs ±   1.549 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▆▆▄▂                                                      ▂
  ██████▆▃▃▃▅██▇▆▆▅▃▄▄▄▁▃▄▄▃▁▄▃▃▁▄▄▁▁▄▁▁▃▁▁▃▄▁▃▁▁▁▁▃▃▁▁▃▁▁▁▃▅ █
  1.43 μs      Histogram: log(frequency) by time      3.05 μs <

 Memory estimate: 112 bytes, allocs estimate: 11.

FedericoStra avatar Dec 13 '23 17:12 FedericoStra