JET reports type-instability when sampling from `Beta`
Consider the following:
julia> @report_opt rand(Beta(1, 1), 10)
═════ 1 possible error found ═════
┌ rand(s::Beta{Float64}, dims::Int64) @ Distributions /Users/bvdmitri/.julia/packages/Distributions/fgrZq/src/genericrand.jl:22
│┌ rand(::Random.TaskLocalRNG, ::Beta{Float64}, ::Int64) @ Distributions /Users/bvdmitri/.julia/packages/Distributions/fgrZq/src/genericrand.jl:24
││┌ rand(rng::Random.TaskLocalRNG, s::Beta{Float64}, dims::Tuple{Int64}) @ Distributions /Users/bvdmitri/.julia/packages/Distributions/fgrZq/src/genericrand.jl:52
│││ runtime dispatch detected: Distributions.rand!(rng::Random.TaskLocalRNG, %3::Distributions.BetaSampler{Float64}, %2::Vector{Float64})::Vector{Float64}
││└────────────────────
The Normal works just fine
julia> @report_opt rand(Normal(1, 1), 10)
No errors detected
Also sampling a single point is fine
julia> @report_opt rand(Beta(1, 1))
No errors detected
This harms performance, when sampling "inplace" with Beta (with rand!)
The reason for this observation is that sampler(Beta(...)) is not type stable: Depending on the parameters, a different algorithm is used for sampling. https://github.com/JuliaStats/Distributions.jl/pull/1350 "fixed" the performance and allocation issues arising from this type instability with a function barrier. Based on the benchmarks in that PR it seems it doesn't harm performance anymore.
Ok, that explains a bit the situation. The real issue is that sampler(Beta(..., ...)) is type-unstable and depends on the parameters. But can it always return the same object, which would simply have an if inside depending on the parameters? E.g.
sampler(d::Beta) = BetaSampler(d, other_fields...) # type-stable, always returns the same
function rand!(rng, sampler::BetaSampler, container)
if some_condition(sampler)
one_algorithm!(rng, container)
else
another_algorithm!(rng, container)
end
end
Or a specialized method for rand! for Beta would also solve this issue entirely, e.g.
function rand!(rng, dist::Beta, n)
if some_condition(dist)
rand!(rng, OnerBetaSampler(dist), n)
else
rand!(rng, AnotherBetaSampler(dist), n)
end
end
But can it always return the same object
The motivation for sampler is to precompute algorithm-dependent quantities when drawing multiple variates. The number and type of these quantities are quite different for different algorithms, as you can see in https://github.com/JuliaStats/Distributions.jl/blob/master/src/samplers/gamma.jl. Always computing all of these would be a bit wasteful I think.
Or a specialized method for rand! for Beta would also solve this issue entirely
I still wonder, is there any issue here? It seems the function barrier in #1350 fixed the performance issues.
It shouldn't recompute algorithm-dependent quantities in my second proposal with a specialized rand! since it basically exactly the same code with an explicit static if statement (multiple dispatch is really just a sophisticated runtime if statement).
To answer your question about the performance, yes, it does affect performance in the following case:
julia> foo(x, dist) = foreach(1:100_000) do _
rand!(dist, x)
end
julia> x = zeros(10);
julia> dist = Beta(1, 1)
Beta{Float64}(α=1.0, β=1.0)
julia> sr = sampler(dist)
julia> @btime foo($x, $dist);
30.232 ms (100000 allocations: 6.10 MiB)
julia> @btime foo($x, $sr);
27.726 ms (0 allocations: 0 bytes)
It's not much in this particular example, but it does accumulate in our larger program and it also allocates extra (while the whole idea of rand! is to sample in place)
Another thing, I mentioned that sampling a single point is fine and is in fact type-stable.
julia> @report_opt rand(Beta(1, 1))
No errors detected
That suggests that there is a potential for improvement for the in-place version
To answer your question about the performance, yes, it does affect performance in the following case:
The example is a bit contrieved since ideally you would use the sampler if you want to sample from the same distribution in a loop (similar to how the Random docs show that you should construct and use a Random.Sampler if you want to call rand repeatedly - BTW there's an issue to unify sampler with Random.Sampler a bit more: #1316).
The single variate sampling does not use samplers, hence it behaves differently.
In the real code the distribution is not the same unfortunately