Distributions.jl
Distributions.jl copied to clipboard
Computed field len and rand now returns type T in Uniform{T}
Added computed field len which is the length of the interval to be sampled from. Also replaced all instances of (d.b-d.a) by the new field d.len.
Fixed an issue where rand(::Uniform{Float32}) (and other, non Float64 types) did not in fact return a Float32, but a Float64, which caused some performance issues. This was done by modifiying the method rand(rng::AbstractRNG, d::Uniform) to include the subtype of Uniform and pass it to rand.
This was discussed on discourse
The rand
method follows the current design, currently almost all distributions return samples of type Float64
and Int
and intentionally the random variate type is independent of the parameter types (there are many issues and discussions in the repo). I would like to improve this design, and I've discussed a better design with some other contributors, but this requires more and different changes. We should not introduce an additional inconsistency by making rand
of Uniform
special.
Regarding len
: IMO we should not add such a field since it is not an actual parameter, could easily cause inconsistencies and I think your example should be fixed differently. As in Random, you should construct a sampler when you perform repeated sampling (unfortunately, the syntax is a bit different in Distributions, possibly because it predates (?) the interface in Random). len
should only be computed when constructing such a sampler.
In particular, this corresponds to issue #1783.
With the current design, the "correct" fix would be to always return Float64 as the variates are supposed to be of type Float64. But as said, I'd rather fix this design. I'll try to write up our suggestions and draft a PR within the next few days.
Why would one expect a Float64 result when sampling a distribution with parameters only specified to Float32 precision, especially if the parameters are from the same space as the random variable?
I think that API to specify the output type would be nice, but in the absence of such an initiative and given the fact that there is no promise regarding Float64 output, the proposed change looks reasonable to me.
The
rand
method follows the current design, currently almost all distributions return samples of typeFloat64
andInt
and intentionally the random variate type is independent of the parameter types (there are many issues and discussions in the repo). I would like to improve this design, and I've discussed a better design with some other contributors, but this requires more and different changes. We should not introduce an additional inconsistency by makingrand
ofUniform
special.
I haven't used Distributions extensively, but currently calling rand on Uniform{Float32} does also return a Float32. Regarding your comment on the len field, I don't see why this would be a problem. For most design patterns in Julia we may assume a user won't directly change any struct fields, right? If so then inconsistencies shouldn't be a problem.
currently calling rand on Uniform{Float32} does also return a Float32
@f-ij, was this a typo? It returns a Float64
currently:
julia> u = Uniform(1.0f0, 2.0f0)
Uniform{Float32}(a=1.0f0, b=2.0f0)
julia> rand(u)
1.8944514300368434
julia> typeof(ans)
Float64
currently calling rand on Uniform{Float32} does also return a Float32
@f-ij, was this a typo? It returns a
Float64
currently:julia> u = Uniform(1.0f0, 2.0f0) Uniform{Float32}(a=1.0f0, b=2.0f0) julia> rand(u) 1.8944514300368434 julia> typeof(ans) Float64
Sorry that was indeed a typo! I meant Normal{Float32}..
Normal
is the inconsistent one that does not follow the design. eltype
is supposed to specify the element type of the random variates, partype
is the type of the parameters. By design, they are intended to be distinct and eltype
is Float64
for continuous distributions and Int
for discrete distributions by default.
Changing rand
for Uniform
as suggested in this PR is not a fix since it just introduces another inconsistent distribution.
There are clear ideas for a better design and a push for making these changes.
What is the issue/PR to track/contribute to? Sorry @f-ij for sending you in the wrong direction. I was under the false impression that this would be uncontroversial.
There are two different aspects: The len
precomputation for repeated calls of rand
should be addressed separately from the general rand
interface design question.
IMO the most natural way to address the first one is to create a Distributions.Sampler
type for Uniform
that can be used for repeated sampling, since the same approach is also used in Random: If you want to perform repeated sampling with the same object, you use
spl = sampler(dist)
x = rand(spl)
for _ in 1:1_000_000
x += rand(spl)
end
instead of
x = rand(dist)
for _ in 1:1_000_000
x += rand(dist)
end
(FYI these samplers are already used internally when you create an array of samples; if there exists no sampler for a distribution, sampler(dist)
just falls back to dist
, ie. no special pre-computations are performed.) AFAIK there exists no issue or other PR for this possible performance improvement yet.
The design question is discussed in multiple issues and even PRs (#1433), but it's a bit scattered since multiple duplicate issues have been opened and discussions started from scratch or recircled to already existing explanations.
I did not know about sampler
yet, but that's a very good point.
What is the issue/PR to track/contribute to? Sorry @f-ij for sending you in the wrong direction. I was under the false impression that this would be uncontroversial.
Not a problem at all, it was not much work and I gained me some insight out of it.