perf improvements to MultinomialSampler
from slack:
regarding https://github.com/JuliaStats/Distributions.jl/pull/1831/files I realized that the portion
# Use an alias table
fill!(x, zero(eltype(x)))
a = s.alias
for i = 1:n
x[rand(rng, a)] += 1
end
in _rand!(rng::AbstractRNG, s::MultinomialSampler, x::AbstractVector{<:Real})
can be further improved by making the internal call to rand(rng, s.alias) generate the indices & acceptance probabilities as a batch, something more like
function _rand2!(rng::AbstractRNG, x, s::AliasTable, n, r_idx, r_acc)
fill!(x, zero(eltype(x)))
rand!(rng, r_idx, 1:length(s.alias))
rand!(rng, r_acc)
@inbounds for i = 1:n
i2 = r_idx[i]
x[ifelse(r_acc[i] < s.accept[i2], i2, s.alias[i2])] += 1
end
end
since Base.rand! itself is more efficient to fill in batch rather than called repeatedly in a loop
however this will either introduce a new allocation (doesn't really impact performance, but going from 0 --> >0 feels bad), or require scratch space in (placeholder) r_idx and r_acc.
would it be appropriate here to include some keyword to designate scratch space? where should that go? I'd prefer to be able to add API to call the alias table sampling in a loop directly, since currently it seems the branch for multinom_rand! is never worth it unless n >> k
in case this is motivating information, with both changes together multinomial sampling can be about 4x faster than what is currently live (edited)
Can't we just use Random.Sampler(..., Val(Inf)) to get similar optimizations for repeated sampling as rand! calls into a vector (see https://docs.julialang.org/en/v1/stdlib/Random/#Random.Sampler)?
in this case, it's not really "repeated sampling" in the same sense as most other samplers, as this represents a single draw from a Multinomial , which just so happens to be implemented by repeated sampling from a Categorical vis a vis an AliasTable. if you call rand! on a MultinomialSampler it wants a vector of vectors, one for each draw.
in my main use-case for this I will be calling _rand! directly ---- I think the interfaces are not really well-prepared to have the random return values themselves be mutable, as most are scalar.
if the idea looks ok I'll go fix all the tests and such, just wanted to hold off on that if this cannot be merged for other reasons
benchmarks of 4 different attempts to multinomial sampling. status quo is more or less equivalent to binom column here, which calls multinom_rand!
so there is definitely opportunity to do a lot better at some ranges of input. marking this as draft so I can do more thorough work
note that direct_binsearch includes the time to compute cdf(probs). if somehow this can be computed in advance, it's much faster. similarly sample_at has to construct an AliasTable
a challenge here is that the "best" choice can be any of 4-5 different algorithms (with significant performance implications) depending not only on n and k but also on the number of times (say, M) the sampler will be drawn from, as the cost of constructing the sampler itself can be large.
It seems ordinarily the convention is to assume any cost of construction is small compared to cost of repeated sampling, i.e. M >> n,k . But I think this is unlikely to be the case, especially when one of n or k is large.
what's a good path forward? maybe to implement each approach separately, document the performance tradeoffs, and throw heuristics into a MultinomialPolySampler ?
cleaned up the test failures, so this is ready for review. just as a reminder of the motivation, performance is improved across the board for all parameter choices, and for some parts of the rangen < k, this implementation is up to 20-100x faster than status quo
I think I should walk back a bit some of the attempt at more algorithm heuristics, as in the case when the number of samples is large (as would be expected from a Sampler) one of the two choices of AliasTable vs Binomial sampling will be optimal
so, the overall functionality now is basically identical to status quo, except that the parameter to switch between alias table & binomial sampling is better tuned, and there is a specific optimization to make use of vectorized rand calls
the only downside is that now sampler(::Multinomial) is type-unstable, but imo that is well worth it.
Codecov Report
Attention: Patch coverage is 78.94737% with 4 lines in your changes are missing coverage. Please review.
Project coverage is 85.97%. Comparing base (
818814f) to head (5f39ae0).
| Files | Patch % | Lines |
|---|---|---|
| src/samplers/multinomial.jl | 70.00% | 3 Missing :warning: |
| src/multivariate/multinomial.jl | 88.88% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #1834 +/- ##
=======================================
Coverage 85.96% 85.97%
=======================================
Files 144 144
Lines 8647 8653 +6
=======================================
+ Hits 7433 7439 +6
Misses 1214 1214
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@devmotion I'm sorry to pester, I know your attention is probably stretched thin being basically one of only a small number of maintainers for a package as popular as this
but does a change like this seem desired at all? if there is not a path to merge I am happy to close the PR and just put the logic into my local project
@andreasnoack is probably also equally stretched. Pinging him here as well. @devmotion is everywhere and certainly stretched thin, I would imagine!
We should also see if we can grow the number of maintainers for this package, and if there are specific folks we should give commit access to, I would love to at least facilitate the conversation.
Ah, I didn't realize that the status of the PR was changed to "ready for review".
Some high-level comments right away:
- I'm all for better performance if the performance improvement is demonstrated (which seems to be the case?) and justifies the possible increased code complexity
sampler(::Distribution)is already type unstable for a few other distributions (exactly to choose an optimized sampling algorithm), so this seems fine to me (inrandwe have a function barrier that minimizes the practical effect of the type instability)- IIRC the typical pattern is to branch in
sampler(::Distribution)- the abstract super type and its "constructor" seems to be an approach that is not used by other samplers in Distributions. I suggest following the example of the other samplers and removing the abstract super type.
the abstract super type and its "constructor" seems to be an approach that is not used by other samplers in Distributions.
fair point, happy to address this
only question being: what should happen to the existing name MultinomialSampler ? I guess it is neither exported nor has a docstring, so it's ok to remove?
If it's not exported, it should be fine to remove it. But probably a bit safer would it to deprecate the type with e.g.
Base.@deprecate_binding MultinomialSampler MultinomialSamplerSequential false
(by default the macro exports the deprecated type; with false it is not exported). See, e.g., https://invenia.github.io/blog/2022/06/17/deprecating-in-julia/
thanks for the comments -- let me know if the most recent commit addresses them appropriately
friendly bump
I hope I have addressed all comments --- the benchmarks still look good on my end so nothing regressed (as expected)
took advantage of latest changes to AliasTable
the approach
rand!(rng, s.scratch_alias_rng)
for r in s.scratch_alias_rng
x[AliasTables.sample(r, s.alias.at)] += 1
end
to batch sample the seed is slightly faster (5-10%) than the existing x[rand(s.alias.at)] loop
another friendly bump :)
I'm sorry to be a nag, but bumping again.
Is there a reason this wasn't merged?