Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

Computed field len and rand now returns type T in Uniform{T}

Open f-ij opened this issue 1 year ago • 13 comments

Added computed field len which is the length of the interval to be sampled from. Also replaced all instances of (d.b-d.a) by the new field d.len.

Fixed an issue where rand(::Uniform{Float32}) (and other, non Float64 types) did not in fact return a Float32, but a Float64, which caused some performance issues. This was done by modifiying the method rand(rng::AbstractRNG, d::Uniform) to include the subtype of Uniform and pass it to rand.

This was discussed on discourse

f-ij avatar Nov 06 '23 18:11 f-ij

The rand method follows the current design, currently almost all distributions return samples of type Float64 and Int and intentionally the random variate type is independent of the parameter types (there are many issues and discussions in the repo). I would like to improve this design, and I've discussed a better design with some other contributors, but this requires more and different changes. We should not introduce an additional inconsistency by making rand of Uniform special.

devmotion avatar Nov 06 '23 18:11 devmotion

Regarding len: IMO we should not add such a field since it is not an actual parameter, could easily cause inconsistencies and I think your example should be fixed differently. As in Random, you should construct a sampler when you perform repeated sampling (unfortunately, the syntax is a bit different in Distributions, possibly because it predates (?) the interface in Random). len should only be computed when constructing such a sampler.

devmotion avatar Nov 06 '23 18:11 devmotion

In particular, this corresponds to issue #1783.

stevengj avatar Nov 06 '23 19:11 stevengj

With the current design, the "correct" fix would be to always return Float64 as the variates are supposed to be of type Float64. But as said, I'd rather fix this design. I'll try to write up our suggestions and draft a PR within the next few days.

devmotion avatar Nov 06 '23 19:11 devmotion

Why would one expect a Float64 result when sampling a distribution with parameters only specified to Float32 precision, especially if the parameters are from the same space as the random variable?

I think that API to specify the output type would be nice, but in the absence of such an initiative and given the fact that there is no promise regarding Float64 output, the proposed change looks reasonable to me.

simsurace avatar Nov 06 '23 19:11 simsurace

The rand method follows the current design, currently almost all distributions return samples of type Float64 and Int and intentionally the random variate type is independent of the parameter types (there are many issues and discussions in the repo). I would like to improve this design, and I've discussed a better design with some other contributors, but this requires more and different changes. We should not introduce an additional inconsistency by making rand of Uniform special.

I haven't used Distributions extensively, but currently calling rand on Uniform{Float32} does also return a Float32. Regarding your comment on the len field, I don't see why this would be a problem. For most design patterns in Julia we may assume a user won't directly change any struct fields, right? If so then inconsistencies shouldn't be a problem.

f-ij avatar Nov 06 '23 19:11 f-ij

currently calling rand on Uniform{Float32} does also return a Float32

@f-ij, was this a typo? It returns a Float64 currently:

julia> u = Uniform(1.0f0, 2.0f0)
Uniform{Float32}(a=1.0f0, b=2.0f0)

julia> rand(u)
1.8944514300368434

julia> typeof(ans)
Float64

stevengj avatar Nov 06 '23 21:11 stevengj

currently calling rand on Uniform{Float32} does also return a Float32

@f-ij, was this a typo? It returns a Float64 currently:

julia> u = Uniform(1.0f0, 2.0f0)
Uniform{Float32}(a=1.0f0, b=2.0f0)

julia> rand(u)
1.8944514300368434

julia> typeof(ans)
Float64

Sorry that was indeed a typo! I meant Normal{Float32}..

f-ij avatar Nov 06 '23 22:11 f-ij

Normal is the inconsistent one that does not follow the design. eltype is supposed to specify the element type of the random variates, partype is the type of the parameters. By design, they are intended to be distinct and eltype is Float64 for continuous distributions and Int for discrete distributions by default.

Changing rand for Uniform as suggested in this PR is not a fix since it just introduces another inconsistent distribution.

There are clear ideas for a better design and a push for making these changes.

devmotion avatar Nov 06 '23 23:11 devmotion

What is the issue/PR to track/contribute to? Sorry @f-ij for sending you in the wrong direction. I was under the false impression that this would be uncontroversial.

simsurace avatar Nov 07 '23 08:11 simsurace

There are two different aspects: The len precomputation for repeated calls of rand should be addressed separately from the general rand interface design question.

IMO the most natural way to address the first one is to create a Distributions.Sampler type for Uniform that can be used for repeated sampling, since the same approach is also used in Random: If you want to perform repeated sampling with the same object, you use

spl = sampler(dist)
x = rand(spl)
for _ in 1:1_000_000
    x += rand(spl)
end

instead of

x = rand(dist)
for _ in 1:1_000_000
    x += rand(dist)
end

(FYI these samplers are already used internally when you create an array of samples; if there exists no sampler for a distribution, sampler(dist) just falls back to dist, ie. no special pre-computations are performed.) AFAIK there exists no issue or other PR for this possible performance improvement yet.

The design question is discussed in multiple issues and even PRs (#1433), but it's a bit scattered since multiple duplicate issues have been opened and discussions started from scratch or recircled to already existing explanations.

devmotion avatar Nov 07 '23 09:11 devmotion

I did not know about sampler yet, but that's a very good point.

simsurace avatar Nov 07 '23 10:11 simsurace

What is the issue/PR to track/contribute to? Sorry @f-ij for sending you in the wrong direction. I was under the false impression that this would be uncontroversial.

Not a problem at all, it was not much work and I gained me some insight out of it.

f-ij avatar Nov 07 '23 15:11 f-ij