Distributions.jl icon indicating copy to clipboard operation
Distributions.jl copied to clipboard

`rand(dist)` does not consistently honor the element type of `dist`

Open vargonis opened this issue 5 years ago • 11 comments

One has

julia> typeof(rand(Uniform{Float32}(0f0,1f0)))
Float64

Similar inconsistencies occurs with Exponential and Gamma, although not with Normal. Relatedly:

julia> eltype(Uniform{Float32}(0f0,1f0))
Float64

This is rather easy to fix. I've already taken care of the few cases that are currently producing errors in my code---take a look here. However, this is not complete and does not merit a PR.

vargonis avatar Feb 20 '20 09:02 vargonis

It's Normal that is the odd one out here. The element type of a distribution specifies the type of the parameters, not the type of the variates produced by rand. See https://github.com/JuliaStats/Distributions.jl/pull/1045#issuecomment-571559677

andreasnoack avatar Feb 20 '20 10:02 andreasnoack

I've been pondering this and I'm not convinced. If the element type was intended to be the type of the parameters (of which there can be more than one!), I think that was a mistake that can be easily corrected. If anything, regarding the comment on #1045 you mention, one could perhaps assume that all parameters are, or can be safely cast to, Float64; in contrast, having control over the domain of the distribution, as reflected by the type one obtains upon sampling, is very important. The modifications I've done, for instance, were motivated by the need to speed up things using GPUs, which work best with Float32s. I therefore think solution 3 in that comment is the right one, for both practical and purely mathematical reasons. Finally, without being an expert in those matters, it seems to me that the overhead would be negligible (one just needs to dispatch on that type, which one presumably already does because it is also the type of the parameters).

On Thursday, February 20, 2020, Andreas Noack [email protected] wrote:

It's Normal that is the odd one out here. The element type of a distribution specifies the type of the parameters, not the type of the variates produced by rand. See #1045 (comment) https://github.com/JuliaStats/Distributions.jl/pull/1045#issuecomment-571559677

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/JuliaStats/Distributions.jl/issues/1071?email_source=notifications&email_token=AELWJOY2QCDNCQJ26QJIINLRDZKCLA5CNFSM4KYLGPL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMM7MGI#issuecomment-588903961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELWJOYAQXSS2PE7TFSOH3LRDZKCLANCNFSM4KYLGPLQ .

-- "Io sono giunto a credere che il mondo intero è un enigma, un enigma innocuo che è reso terribile dalla nostra pessima idea di interpretarlo come se avesse una verità fondamentale." -Umberto Eco

vargonis avatar Feb 20 '20 21:02 vargonis

The overhead I'm referring to is overhead for the programmer. The complexity of managing your methods grows a lot in the number of type parameters that you have to support. Drawing Float32 variates is, of course, a reasonable use case but the question is how to support it. It could be as part of the distribution type or it could be part of the rand method. I suspect that the latter might be a simpler solution.

andreasnoack avatar Feb 20 '20 21:02 andreasnoack

By the way, I just realized that there's the ValueSupport type which can be either Int or Float64. So, here I'm really colliding with the limitations of Distributions' design... :/

vargonis avatar Feb 21 '20 07:02 vargonis

The original design of Distributions was ended up being too strict as it only allowed parameters and variates to be either Int or Float64. To support automatic differentiation, we've moved towards making most distributions parametric to allow the parameters to be dual numbers. We might need further adjustments of the design to handle e.g. GPU cases. We have also had discussions about simplifying the rich abstract type tree used in Distributions as it doesn't help much and is sometimes a problem, see e.g. the discussion of the empirical distribution.

andreasnoack avatar Feb 21 '20 08:02 andreasnoack

Looking forward to see the new design! I do hope it will allow for distributions with arbitrary support. Having just Int and Float64 is annoyingly limiting.

vargonis avatar Feb 21 '20 08:02 vargonis

This was solved by #951 before I stopped working on it by imputing return type from parameter type where appropriate. It had been an ongoing bugbear of mine, but it didn't seem high on other people's priority lists... it seems like the kind of thing that should be fixed somehow before a v1.0 release at any rate.

richardreeve avatar Mar 11 '20 07:03 richardreeve

This is still an issue vis a vis Dirichlet and Normal, see discussion in #1338 .

ztangent avatar Jun 03 '21 14:06 ztangent

This was solved by #951 before I stopped working on it by imputing return type from parameter type where appropriate. It had been an ongoing bugbear of mine, but it didn't seem high on other people's priority lists... it seems like the kind of thing that should be fixed somehow before a v1.0 release at any rate.

I don't know how we want to define eltype, but I do think we need to settle on a single consistent definition and enforce it.

ParadaCarleton avatar Sep 08 '23 00:09 ParadaCarleton

I'd like to ask for a decision on this again; besides ruling out GPU support, machine learning applications, compatibility with other packages (which try to preserve types), and precisions other than 64 bits, it seems like the inconsistent behavior of rand and eltype makes property-based testing substantially harder.

ParadaCarleton avatar Oct 23 '23 01:10 ParadaCarleton

Is there any progress on this or maybe a way to help to get this resolved?

To add to this (I've read many but not all of the linked issues so far), the current behaviour breaks the promise from the standard library that rand!(rng, A, S) is the same as copyto!(A, rand(rng, S, size(A))) but for the allocation of an additional array.

Compare:

julia> using Random
julia> using Distributions
julia> T = Float32
julia> n = 3
julia> dist = Uniform{T}(zero(T), one(T))
julia> A = Vector{T}(undef, n)
julia> rand!(Random.Xoshiro(1), dist, A)
julia> B = Vector{T}(undef, n)
julia> copyto!(B, rand(Random.Xoshiro(1), dist, n))
julia> print(A == B)
false

(I guess it doesn't make much sense to open another issue for this inconsistency which is highly likely caused by the other problems with the return types?)

Thank you for putting so much work into this library! :slightly_smiling_face:

EDIT: For context: I want to be able to memory map large arrays to save some RAM but in order to do that I have to use rand! to write into the memory-mapped array. At the same time, I need random number generation to be reproducible. Right now, this does not work as expected due to above problem.

EDIT': Of course I can work around this, just wanted to point out the inconsistency with the standard lib.

dpaetzel avatar Jul 18 '24 18:07 dpaetzel