Distributions.jl perf improvements to MultinomialSampler

from slack:

regarding https://github.com/JuliaStats/Distributions.jl/pull/1831/files I realized that the portion

# Use an alias table
fill!(x, zero(eltype(x)))
a = s.alias
for i = 1:n
    x[rand(rng, a)] += 1
end

in _rand!(rng::AbstractRNG, s::MultinomialSampler, x::AbstractVector{<:Real}) can be further improved by making the internal call to rand(rng, s.alias) generate the indices & acceptance probabilities as a batch, something more like

function _rand2!(rng::AbstractRNG, x, s::AliasTable, n, r_idx, r_acc)
    fill!(x, zero(eltype(x)))
    rand!(rng, r_idx, 1:length(s.alias))
    rand!(rng, r_acc)

    @inbounds for i = 1:n
        i2 = r_idx[i]
        x[ifelse(r_acc[i] < s.accept[i2], i2, s.alias[i2])] += 1
    end
end

since Base.rand! itself is more efficient to fill in batch rather than called repeatedly in a loop however this will either introduce a new allocation (doesn't really impact performance, but going from 0 --> >0 feels bad), or require scratch space in (placeholder) r_idx and r_acc. would it be appropriate here to include some keyword to designate scratch space? where should that go? I'd prefer to be able to add API to call the alias table sampling in a loop directly, since currently it seems the branch for multinom_rand! is never worth it unless n >> k in case this is motivating information, with both changes together multinomial sampling can be about 4x faster than what is currently live (edited)

Jan 31 '24 20:01 adienes

Can't we just use Random.Sampler(..., Val(Inf)) to get similar optimizations for repeated sampling as rand! calls into a vector (see https://docs.julialang.org/en/v1/stdlib/Random/#Random.Sampler)?

Jan 31 '24 20:01 devmotion

in this case, it's not really "repeated sampling" in the same sense as most other samplers, as this represents a single draw from a Multinomial , which just so happens to be implemented by repeated sampling from a Categorical vis a vis an AliasTable. if you call rand! on a MultinomialSampler it wants a vector of vectors, one for each draw.

in my main use-case for this I will be calling _rand! directly ---- I think the interfaces are not really well-prepared to have the random return values themselves be mutable, as most are scalar.

Jan 31 '24 20:01 adienes

if the idea looks ok I'll go fix all the tests and such, just wanted to hold off on that if this cannot be merged for other reasons

Jan 31 '24 21:01 adienes

benchmarks of 4 different attempts to multinomial sampling. status quo is more or less equivalent to binom column here, which calls multinom_rand!

so there is definitely opportunity to do a lot better at some ranges of input. marking this as draft so I can do more thorough work

note that direct_binsearch includes the time to compute cdf(probs). if somehow this can be computed in advance, it's much faster. similarly sample_at has to construct an AliasTable

Feb 02 '24 23:02 adienes

a challenge here is that the "best" choice can be any of 4-5 different algorithms (with significant performance implications) depending not only on n and k but also on the number of times (say, M) the sampler will be drawn from, as the cost of constructing the sampler itself can be large.

It seems ordinarily the convention is to assume any cost of construction is small compared to cost of repeated sampling, i.e. M >> n,k . But I think this is unlikely to be the case, especially when one of n or k is large.

what's a good path forward? maybe to implement each approach separately, document the performance tradeoffs, and throw heuristics into a MultinomialPolySampler ?

Feb 05 '24 17:02 adienes

cleaned up the test failures, so this is ready for review. just as a reminder of the motivation, performance is improved across the board for all parameter choices, and for some parts of the rangen < k, this implementation is up to 20-100x faster than status quo

I think I should walk back a bit some of the attempt at more algorithm heuristics, as in the case when the number of samples is large (as would be expected from a Sampler) one of the two choices of AliasTable vs Binomial sampling will be optimal

so, the overall functionality now is basically identical to status quo, except that the parameter to switch between alias table & binomial sampling is better tuned, and there is a specific optimization to make use of vectorized rand calls

the only downside is that now sampler(::Multinomial) is type-unstable, but imo that is well worth it.

Feb 19 '24 17:02 adienes

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 85.97%. Comparing base (818814f) to head (5f39ae0).

Files	Patch %	Lines
src/samplers/multinomial.jl	70.00%	3 Missing :warning:
src/multivariate/multinomial.jl	88.88%	1 Missing :warning:

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1834   +/-   ##
=======================================
  Coverage   85.96%   85.97%           
=======================================
  Files         144      144           
  Lines        8647     8653    +6     
=======================================
+ Hits         7433     7439    +6     
  Misses       1214     1214

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Feb 19 '24 18:02 codecov-commenter

@devmotion I'm sorry to pester, I know your attention is probably stretched thin being basically one of only a small number of maintainers for a package as popular as this

but does a change like this seem desired at all? if there is not a path to merge I am happy to close the PR and just put the logic into my local project

Feb 28 '24 17:02 adienes

@andreasnoack is probably also equally stretched. Pinging him here as well. @devmotion is everywhere and certainly stretched thin, I would imagine!

We should also see if we can grow the number of maintainers for this package, and if there are specific folks we should give commit access to, I would love to at least facilitate the conversation.

Feb 28 '24 19:02 ViralBShah

Ah, I didn't realize that the status of the PR was changed to "ready for review".

Some high-level comments right away:

I'm all for better performance if the performance improvement is demonstrated (which seems to be the case?) and justifies the possible increased code complexity
sampler(::Distribution) is already type unstable for a few other distributions (exactly to choose an optimized sampling algorithm), so this seems fine to me (in rand we have a function barrier that minimizes the practical effect of the type instability)
IIRC the typical pattern is to branch in sampler(::Distribution) - the abstract super type and its "constructor" seems to be an approach that is not used by other samplers in Distributions. I suggest following the example of the other samplers and removing the abstract super type.

Feb 28 '24 19:02 devmotion

the abstract super type and its "constructor" seems to be an approach that is not used by other samplers in Distributions.

fair point, happy to address this

only question being: what should happen to the existing name MultinomialSampler ? I guess it is neither exported nor has a docstring, so it's ok to remove?

Feb 28 '24 20:02 adienes

If it's not exported, it should be fine to remove it. But probably a bit safer would it to deprecate the type with e.g.

Base.@deprecate_binding MultinomialSampler MultinomialSamplerSequential false

(by default the macro exports the deprecated type; with false it is not exported). See, e.g., https://invenia.github.io/blog/2022/06/17/deprecating-in-julia/

Feb 29 '24 02:02 devmotion

thanks for the comments -- let me know if the most recent commit addresses them appropriately

Mar 03 '24 01:03 adienes

friendly bump

Mar 16 '24 22:03 adienes

I hope I have addressed all comments --- the benchmarks still look good on my end so nothing regressed (as expected)

Mar 31 '24 18:03 adienes

took advantage of latest changes to AliasTable

the approach

    rand!(rng, s.scratch_alias_rng)

    for r in s.scratch_alias_rng
        x[AliasTables.sample(r, s.alias.at)] += 1
    end

to batch sample the seed is slightly faster (5-10%) than the existing x[rand(s.alias.at)] loop

Apr 20 '24 16:04 adienes

another friendly bump :)

May 02 '24 18:05 adienes

I'm sorry to be a nag, but bumping again.

May 19 '24 14:05 adienes

Is there a reason this wasn't merged?

Jun 26 '24 15:06 brenhinkeller