Distributions.jl Computed field len and rand now returns type T in Uniform{T}

Added computed field len which is the length of the interval to be sampled from. Also replaced all instances of (d.b-d.a) by the new field d.len.

Fixed an issue where rand(::Uniform{Float32}) (and other, non Float64 types) did not in fact return a Float32, but a Float64, which caused some performance issues. This was done by modifiying the method rand(rng::AbstractRNG, d::Uniform) to include the subtype of Uniform and pass it to rand.

This was discussed on discourse

Nov 06 '23 18:11 f-ij

The rand method follows the current design, currently almost all distributions return samples of type Float64 and Int and intentionally the random variate type is independent of the parameter types (there are many issues and discussions in the repo). I would like to improve this design, and I've discussed a better design with some other contributors, but this requires more and different changes. We should not introduce an additional inconsistency by making rand of Uniform special.

Nov 06 '23 18:11 devmotion

Regarding len: IMO we should not add such a field since it is not an actual parameter, could easily cause inconsistencies and I think your example should be fixed differently. As in Random, you should construct a sampler when you perform repeated sampling (unfortunately, the syntax is a bit different in Distributions, possibly because it predates (?) the interface in Random). len should only be computed when constructing such a sampler.

Nov 06 '23 18:11 devmotion

In particular, this corresponds to issue #1783.

Nov 06 '23 19:11 stevengj

With the current design, the "correct" fix would be to always return Float64 as the variates are supposed to be of type Float64. But as said, I'd rather fix this design. I'll try to write up our suggestions and draft a PR within the next few days.

Nov 06 '23 19:11 devmotion

Why would one expect a Float64 result when sampling a distribution with parameters only specified to Float32 precision, especially if the parameters are from the same space as the random variable?

I think that API to specify the output type would be nice, but in the absence of such an initiative and given the fact that there is no promise regarding Float64 output, the proposed change looks reasonable to me.

Nov 06 '23 19:11 simsurace

The rand method follows the current design, currently almost all distributions return samples of type Float64 and Int and intentionally the random variate type is independent of the parameter types (there are many issues and discussions in the repo). I would like to improve this design, and I've discussed a better design with some other contributors, but this requires more and different changes. We should not introduce an additional inconsistency by making rand of Uniform special.

I haven't used Distributions extensively, but currently calling rand on Uniform{Float32} does also return a Float32. Regarding your comment on the len field, I don't see why this would be a problem. For most design patterns in Julia we may assume a user won't directly change any struct fields, right? If so then inconsistencies shouldn't be a problem.

Nov 06 '23 19:11 f-ij

currently calling rand on Uniform{Float32} does also return a Float32

@f-ij, was this a typo? It returns a Float64 currently:

julia> u = Uniform(1.0f0, 2.0f0)
Uniform{Float32}(a=1.0f0, b=2.0f0)

julia> rand(u)
1.8944514300368434

julia> typeof(ans)
Float64

Nov 06 '23 21:11 stevengj

currently calling rand on Uniform{Float32} does also return a Float32

@f-ij, was this a typo? It returns a Float64 currently:
julia> u = Uniform(1.0f0, 2.0f0)
Uniform{Float32}(a=1.0f0, b=2.0f0)

julia> rand(u)
1.8944514300368434

julia> typeof(ans)
Float64

Sorry that was indeed a typo! I meant Normal{Float32}..

Nov 06 '23 22:11 f-ij

Normal is the inconsistent one that does not follow the design. eltype is supposed to specify the element type of the random variates, partype is the type of the parameters. By design, they are intended to be distinct and eltype is Float64 for continuous distributions and Int for discrete distributions by default.

Changing rand for Uniform as suggested in this PR is not a fix since it just introduces another inconsistent distribution.

There are clear ideas for a better design and a push for making these changes.

Nov 06 '23 23:11 devmotion

What is the issue/PR to track/contribute to? Sorry @f-ij for sending you in the wrong direction. I was under the false impression that this would be uncontroversial.

Nov 07 '23 08:11 simsurace

There are two different aspects: The len precomputation for repeated calls of rand should be addressed separately from the general rand interface design question.

IMO the most natural way to address the first one is to create a Distributions.Sampler type for Uniform that can be used for repeated sampling, since the same approach is also used in Random: If you want to perform repeated sampling with the same object, you use

spl = sampler(dist)
x = rand(spl)
for _ in 1:1_000_000
    x += rand(spl)
end

instead of

x = rand(dist)
for _ in 1:1_000_000
    x += rand(dist)
end

(FYI these samplers are already used internally when you create an array of samples; if there exists no sampler for a distribution, sampler(dist) just falls back to dist, ie. no special pre-computations are performed.) AFAIK there exists no issue or other PR for this possible performance improvement yet.

The design question is discussed in multiple issues and even PRs (#1433), but it's a bit scattered since multiple duplicate issues have been opened and discussions started from scratch or recircled to already existing explanations.

Nov 07 '23 09:11 devmotion

I did not know about sampler yet, but that's a very good point.

Nov 07 '23 10:11 simsurace

What is the issue/PR to track/contribute to? Sorry @f-ij for sending you in the wrong direction. I was under the false impression that this would be uncontroversial.

Not a problem at all, it was not much work and I gained me some insight out of it.

Nov 07 '23 15:11 f-ij

Distributions.jl Distributions.jl copied to clipboard

Computed field len and rand now returns type T in Uniform{T}

Distributions.jl
Distributions.jl copied to clipboard