Distributions.jl
Distributions.jl copied to clipboard
Plans for RNG changes?
Currently, StatsFuns' distribution RNGs such as rand(Poisson(lambda)) are the de-facto standard. However, they are implemented using the Rmath library. This leads to some issues.
- Julia is actively developing more RNGs via RNG.jl as part of a GSoC project. It would be nice to be able to swap which backend RNG is used in these algorithms (some are better for parallel, some have lower accracy but are faster, etc.). The Rmath library had to be modified to use the DSFMT RNG, but going forward it doesn't seem like it would be easy to have it interface with all of these different RNGs.
- Minor point as mentioned in JuliaLang/Rmath-julia#11, the return types of all of the RNGs from Rmath is a float, then needs to be converted to an Int.
- These methods aren't optimized for generating a vector of random numbers under certain circumstances. For example, though R uses Rmath, its rpois interfaces with it in such a way that it gets speedups if the same lambda is repeated.
Should this be addressed here in StatsFuns, or should this kind of RNG functionality be moved to RNG.jl (when it's more developed) (or its own package build upon RNG.jl)?
Note that this these RNGs were found to be the bottleneck of an performance comparison, which is why I am interested in speeding them up.
In my opinion, these should all be in Distributions. We need Rmath to cover cases where we don't have good samplers yet but hopefully we can just switch the rand methods one by one to Julia implementations. Meanwhile all of StatsFuns should be merged into Distributions.
These methods aren't optimized for generating a vector of random numbers under certain circumstances. For example, though R uses Rmath, its rpois interfaces with it in such a way that it gets speedups if the same lambda is repeated.
Actually, this is done in Rmath itself, so we should get the same benefit (though it's done via C static variables, which is not thread safe).
I'd like to second the call for better handling of vector random variate generation. For example, generating Gamma variates requires 2 allocations per variate even when I use Distributions.rand! to pre-allocate the array. For example, the following does over 2 million allocations to allocate 1 million gammas:
using Distributions gamma = Gamma() samples = zeros(1_000_000) @time Distributions.rand!( gamma, samples );