Distributions.jl Performance of samplers

Thanks to @simonbyrnes' recent efforts, we already have some in-house samplers.

I just run a systematic benchmarking of samplers implemented in this package (using BenchmarkLite.jl):

Here are the results. They are measured in terms of MPS, that is, million samples per second -- larger number indicates higher throughput.

Categorical:

K    |   direct    alias  
--------------------------
2    |  88.4558  13.4906  
4    |  16.1752  13.0807  
8    |  15.3739  13.3384  
16   |  14.2149  13.3299  
32   |  12.3867  13.2733  
64   |  10.0650  13.2782  
128  |   7.3457  13.3439  
256  |   4.7721  13.3459  
512  |   2.8074  13.4493  
1024 |   1.5340  13.4501  
2048 |   0.8097  13.3803  
4096 |   0.4103  13.3838

Binomial

(n, p)     |    rmath    alias     Geom    BTPE     Poly  
----------------------------------------------------------
(2,0.3)    |  11.8487  12.7639   6.4979  3.7020   6.4290  
(2,0.5)    |  11.9356  12.8119   4.5564  5.8752   4.5401  
(2,0.9)    |  12.6947  12.6949  10.7287  3.7423  10.9793  
(4,0.3)    |  11.4277  12.8164   4.1940  2.7137   4.1481  
(4,0.5)    |  11.3381  12.8352   2.8394  5.9129   2.8481  
(4,0.9)    |  12.4037  12.9807   8.3579  3.7877   8.3253  
(8,0.3)    |  10.6993  12.9467   2.5408  5.9532   2.5291  
(8,0.5)    |  10.3141  12.7722   1.6785  1.1590   1.6710  
(8,0.9)    |  11.9544  12.9081   5.6124  3.6505   5.5750  
(16,0.3)   |   9.7718  12.9083   1.4662  0.4141   1.4124  
(16,0.5)   |   9.0507  12.9732   0.9400  2.1002   0.9370  
(16,0.9)   |  11.2148  12.9855   3.5116  4.4885   3.3525  
(32,0.3)   |   8.3341  13.0976   0.8067  2.1582   0.8043  
(32,0.5)   |   7.0369  13.0287   0.5004  2.4892   0.4992  
(32,0.9)   |  10.4527  13.1709   2.0597  5.9590   2.0518  
(64,0.3)   |   6.3946  13.2282   0.4253  2.5309   0.4250  
(64,0.5)   |   7.0507  13.1909   0.2590  2.7349   2.7035  
(64,0.9)   |   9.3033  13.2035   1.1612  1.7338   1.1599  
(128,0.3)  |   6.9509  13.2026   0.2180  2.7592   2.7444  
(128,0.5)  |   7.3655  12.7911   0.1310  2.9881   2.9361  
(128,0.9)  |   7.6165  13.2572   0.6239  2.2733   0.6288  
(256,0.3)  |   7.2336  13.2470   0.1104  3.1682   3.1921  
(256,0.5)  |   7.6335  13.2319   0.0651  3.2939   3.2998  
(256,0.9)  |   5.5145  13.2989   0.3247  2.6314   2.6450  
(512,0.3)  |   7.6033  13.3139   0.0558  3.3523   3.3676  
(512,0.5)  |   8.1060  13.3286   0.0328  3.5462   3.5645  
(512,0.9)  |   6.9150  13.3382   0.1652  2.9492   2.9943  
(1024,0.3) |   8.2251  13.2799   0.0281  3.6457   3.6760  
(1024,0.5) |   8.5567  13.2236   0.0170  3.7714   3.7975  
(1024,0.9) |   7.2475  13.3995   0.0830  3.2097   3.2585  
(2048,0.3) |   8.6872  13.3920   0.0137  3.8372   3.6625  
(2048,0.5) |   8.9633  13.3610   0.0084  3.9388   3.8654  
(2048,0.9) |   7.9270  13.3193   0.0420  3.5115   3.4960  
(4096,0.3) |   8.8841  13.3026   0.0071  4.0173   3.9788  
(4096,0.5) |   9.3080  13.3894   0.0043  4.1169   4.0903  
(4096,0.9) |   8.3456  13.4372   0.0214  3.7510   3.7052

Poisson

μ    |    rmath    count       AD  
-----------------------------------
0.2  |  15.3078  13.3549      NaN  
0.5  |  14.1964  10.4140      NaN  
1.0  |  13.8194   7.8061      NaN  
2.0  |  13.8571   5.2174      NaN  
5.0  |  13.4399   2.7030  20.6806  
10.0 |  11.9644   1.4928  20.6382  
15.0 |  12.1198   1.0314  21.1652  
20.0 |  12.0475   0.7907  22.0989  
30.0 |  12.2999   0.5376  22.6501

Exponential

scale |     base    logu  
--------------------------
1.0   |  17.0627  8.0938

Gamma

(α, scale) |    rmath      MT  
-------------------------------
(0.5,1.0)  |   7.8765  3.6338  
(1.0,1.0)  |   9.8628  5.7221  
(2.0,1.0)  |  10.0352  5.8673  
(5.0,1.0)  |  12.2446  5.8372  
(20.0,1.0) |  13.0004  5.9519

We are still falling behind Rmath (over 2x slower) for Binomial and Gamma distributions.

Jun 21 '14 21:06 lindahua

The benchmark script is perf/samplers.jl.

Jun 21 '14 21:06 lindahua

That's interesting, I get different results when using the standard rand methods:

using Distributions
using RmathDist

then after warmup

julia> @time rand(Gamma(5.0,1.0),10_000_000)
elapsed time: 0.415310414 seconds (80000152 bytes allocated)

julia> @time rand(Rmath(Gamma(5.0,1.0)),10_000_000)
elapsed time: 0.4185569 seconds (80000176 bytes allocated)

Jun 22 '14 10:06 simonbyrne

The gamma distribution still uses Rmath.

I temporarily turned off the use of GammaMTSampler in recent refactoring, haven't turned it back yet.

Jun 22 '14 10:06 lindahua

Ah, I see. But now I get:

julia> @time rand(Distributions.GammaMTSampler(5.0,1.0),10_000_000)
elapsed time: 0.501402219 seconds (80000168 bytes allocated)

which is only 20% worse.

Jun 22 '14 11:06 simonbyrne

That's probably because the memory allocation + memory access factors in, thus the difference is not so big.

What I did in the benchmark is basically the following:

# not exactly the same code, but doing things the same way as below
function batch_sample!(sampler, n)
    for i = 1:n
        rand(sampler)
    end
end

Purely doing sampling, and no memory-related stuff counted in the benchmark.

Jun 22 '14 11:06 lindahua

This one is even stranger: your tests suggest the Poisson AD sampler is twice as fast, yet:

julia> a = Distributions.PoissonADSampler(20.0)
PoissonADSampler(20.0,4.47213595499958,2400.0,18)

julia> r = Distributions.PoissonRmathSampler(20.0)
PoissonRmathSampler(20.0)

(then after warmup)

julia> @time rand(a,10_000_000);
elapsed time: 0.634922107 seconds (80000128 bytes allocated)

julia> @time rand(r,10_000_000);
elapsed time: 0.440410505 seconds (80000128 bytes allocated)

Jul 01 '14 18:07 simonbyrne

I've added some more functionality to perf/samplers.jl:

The batch method just tests the performance of the iteration (old behaviour):

$ julia perf/samplers.jl batch gamma_hi

BenchmarkTable [unit = mps]
Dist                            |    rmath       MT       GD  
--------------------------------------------------------------
(Gamma(shape=1.5, scale=1.0),)  |  17.0605  20.6627  14.9219  
(Gamma(shape=2.0, scale=1.0),)  |  16.1565  20.0422  14.9253  
(Gamma(shape=3.0, scale=1.0),)  |  16.4016  20.1534  14.3530  
(Gamma(shape=5.0, scale=1.0),)  |  25.0876  21.1654  27.6697  
(Gamma(shape=20.0, scale=1.0),) |  29.6898  21.9201  34.6155

The indiv tests the performance of the construction and iteration (this may be misleading for Rmath, as it uses static variables):

julia perf/samplers.jl indiv gamma_hi

BenchmarkTable [unit = mps]
Dist                            |    rmath       MT       GD  
--------------------------------------------------------------
(Gamma(shape=1.5, scale=1.0),)  |  15.7482  19.1207   7.5798  
(Gamma(shape=2.0, scale=1.0),)  |  15.7333  19.5103   7.7571  
(Gamma(shape=3.0, scale=1.0),)  |  16.1120  19.2441   7.7500  
(Gamma(shape=5.0, scale=1.0),)  |  24.3544  19.8842   9.8533  
(Gamma(shape=20.0, scale=1.0),) |  27.0493  20.6376  11.4383

There are obvious differences here. If the RNG is initialised, do we want both rand(d,N) and [rand(d) for i=1:N] to give identical results?

Dec 03 '14 15:12 simonbyrne

is this an issue to keep around? Likely it can't be "fixed" directly cc @simonbyrne @lindahua

Sep 19 '18 15:09 matbesancon

Likely related to this, the perf folder is in a deprecated state, to remove or update

Feb 27 '19 08:02 matbesancon

I'd support / help with work on this issue. I keep running into this when figuring out the best way to sample from Categorical distributions. It looks like the poly-algorithm in StatsBase.jl,

https://github.com/JuliaStats/StatsBase.jl/blob/41669cd8dfeadce35db0c4e07ac7afe5d10fb957/src/sampling.jl#L837

performs better as it has a nice rule of thumb (presumably based on the benchmarks from @lindahua at the beginning of this issue) for choosing the alias table vs direct sampling.

It would be nice to get the same performance no matter how you sample from a cagegorical 😄

Jan 07 '21 22:01 vancleve

This is a very old issue but I just discoverd that the cheap acceptance check in the GammaMTSampler seems to be incorrect: https://github.com/JuliaStats/Distributions.jl/pull/1617#discussion_r970098297

Fixing it improves performance quite significantly:

$ env gamma=true julia --project=. samplers.jl 
[ Info: Gamma
[ Info: Low
[ Info: GammaGSSampler
[ Info: α: 0.1, result: Trial(30.570 ns)
[ Info: α: 0.5, result: Trial(45.151 ns)
[ Info: α: 0.9, result: Trial(49.529 ns)
[ Info: GammaIPSampler
[ Info: α: 0.1, result: Trial(24.658 ns)
[ Info: α: 0.5, result: Trial(25.022 ns)
[ Info: α: 0.9, result: Trial(24.698 ns)
[ Info: High
[ Info: GammaMTSampler
[ Info: α: 1.5, result: Trial(13.292 ns)
[ Info: α: 2.0, result: Trial(13.209 ns)
[ Info: α: 3.0, result: Trial(12.950 ns)
[ Info: α: 5.0, result: Trial(13.127 ns)
[ Info: α: 20.0, result: Trial(13.135 ns)
[ Info: GammaGDSampler
[ Info: α: 1.5, result: Trial(29.132 ns)
[ Info: α: 2.0, result: Trial(27.114 ns)
[ Info: α: 3.0, result: Trial(24.347 ns)
[ Info: α: 5.0, result: Trial(18.196 ns)
[ Info: α: 20.0, result: Trial(16.562 ns)

With the fix, for all tested parameter values (here and in https://github.com/JuliaStats/Distributions.jl/pull/1617) GammaMTSampler seems to be faster than GammaGDSampler.

Sep 13 '22 21:09 devmotion