`rand(dist)` does not consistently honor the element type of `dist`
One has
julia> typeof(rand(Uniform{Float32}(0f0,1f0)))
Float64
Similar inconsistencies occurs with Exponential and Gamma, although not with Normal. Relatedly:
julia> eltype(Uniform{Float32}(0f0,1f0))
Float64
This is rather easy to fix. I've already taken care of the few cases that are currently producing errors in my code---take a look here. However, this is not complete and does not merit a PR.
It's Normal that is the odd one out here. The element type of a distribution specifies the type of the parameters, not the type of the variates produced by rand. See https://github.com/JuliaStats/Distributions.jl/pull/1045#issuecomment-571559677
I've been pondering this and I'm not convinced. If the element type was intended to be the type of the parameters (of which there can be more than one!), I think that was a mistake that can be easily corrected. If anything, regarding the comment on #1045 you mention, one could perhaps assume that all parameters are, or can be safely cast to, Float64; in contrast, having control over the domain of the distribution, as reflected by the type one obtains upon sampling, is very important. The modifications I've done, for instance, were motivated by the need to speed up things using GPUs, which work best with Float32s. I therefore think solution 3 in that comment is the right one, for both practical and purely mathematical reasons. Finally, without being an expert in those matters, it seems to me that the overhead would be negligible (one just needs to dispatch on that type, which one presumably already does because it is also the type of the parameters).
On Thursday, February 20, 2020, Andreas Noack [email protected] wrote:
It's Normal that is the odd one out here. The element type of a distribution specifies the type of the parameters, not the type of the variates produced by rand. See #1045 (comment) https://github.com/JuliaStats/Distributions.jl/pull/1045#issuecomment-571559677
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/JuliaStats/Distributions.jl/issues/1071?email_source=notifications&email_token=AELWJOY2QCDNCQJ26QJIINLRDZKCLA5CNFSM4KYLGPL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMM7MGI#issuecomment-588903961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELWJOYAQXSS2PE7TFSOH3LRDZKCLANCNFSM4KYLGPLQ .
-- "Io sono giunto a credere che il mondo intero è un enigma, un enigma innocuo che è reso terribile dalla nostra pessima idea di interpretarlo come se avesse una verità fondamentale." -Umberto Eco
The overhead I'm referring to is overhead for the programmer. The complexity of managing your methods grows a lot in the number of type parameters that you have to support. Drawing Float32 variates is, of course, a reasonable use case but the question is how to support it. It could be as part of the distribution type or it could be part of the rand method. I suspect that the latter might be a simpler solution.
By the way, I just realized that there's the ValueSupport type which can be either Int or Float64. So, here I'm really colliding with the limitations of Distributions' design... :/
The original design of Distributions was ended up being too strict as it only allowed parameters and variates to be either Int or Float64. To support automatic differentiation, we've moved towards making most distributions parametric to allow the parameters to be dual numbers. We might need further adjustments of the design to handle e.g. GPU cases. We have also had discussions about simplifying the rich abstract type tree used in Distributions as it doesn't help much and is sometimes a problem, see e.g. the discussion of the empirical distribution.
Looking forward to see the new design! I do hope it will allow for distributions with arbitrary support. Having just Int and Float64 is annoyingly limiting.
This was solved by #951 before I stopped working on it by imputing return type from parameter type where appropriate. It had been an ongoing bugbear of mine, but it didn't seem high on other people's priority lists... it seems like the kind of thing that should be fixed somehow before a v1.0 release at any rate.
This is still an issue vis a vis Dirichlet and Normal, see discussion in #1338 .
This was solved by #951 before I stopped working on it by imputing return type from parameter type where appropriate. It had been an ongoing bugbear of mine, but it didn't seem high on other people's priority lists... it seems like the kind of thing that should be fixed somehow before a v1.0 release at any rate.
I don't know how we want to define eltype, but I do think we need to settle on a single consistent definition and enforce it.
I'd like to ask for a decision on this again; besides ruling out GPU support, machine learning applications, compatibility with other packages (which try to preserve types), and precisions other than 64 bits, it seems like the inconsistent behavior of rand and eltype makes property-based testing substantially harder.
Is there any progress on this or maybe a way to help to get this resolved?
To add to this (I've read many but not all of the linked issues so far), the current behaviour breaks the promise from the standard library that rand!(rng, A, S) is the same as copyto!(A, rand(rng, S, size(A))) but for the allocation of an additional array.
Compare:
julia> using Random
julia> using Distributions
julia> T = Float32
julia> n = 3
julia> dist = Uniform{T}(zero(T), one(T))
julia> A = Vector{T}(undef, n)
julia> rand!(Random.Xoshiro(1), dist, A)
julia> B = Vector{T}(undef, n)
julia> copyto!(B, rand(Random.Xoshiro(1), dist, n))
julia> print(A == B)
false
(I guess it doesn't make much sense to open another issue for this inconsistency which is highly likely caused by the other problems with the return types?)
Thank you for putting so much work into this library! :slightly_smiling_face:
EDIT: For context: I want to be able to memory map large arrays to save some RAM but in order to do that I have to use rand! to write into the memory-mapped array. At the same time, I need random number generation to be reproducible. Right now, this does not work as expected due to above problem.
EDIT': Of course I can work around this, just wanted to point out the inconsistency with the standard lib.