Distributions.jl
Distributions.jl copied to clipboard
Generic sum and product distributions
At a recent tutorial, someone brought up the following possibility for providing a nice syntax for location-scale distributions:
X = Normal(0, 1)
Y = 5X + 1
They also asked about adding two distributions together:
Z = Normal(1, 1) + Normal(2, 3)
I think we should provide a generic univariate SumDistribution
and ProductDistribution
, which we can special case for things like Gaussians. I am already working on the location-scale case and will open a PR today/tomorrow for it.
For distributions controlled by location and scale parameters, it would be good to provide generic interfaces like:
scale(d) # return scale param
location(d) # return location param
For such distributions, things like a * X + b
should be straight forward to implement. And we can go ahead to implement them.
Generally, sum/product of two random variables usually follows a quite complicated distribution with no nice forms. We probably need deeper thoughts on what is the correct interface.
Yes, I'm implementing scale
, location
, rescale
, relocate
and then *
and +
in terms of rescale
and relocate
.
My current thought is that the generic sum of RV's maintains an array of Distribution objects. This won't allow one to perform most functions, but it will at least provide rand
. Other data structures might be superior.
Does Z = Normal(1,1) + Normal(2,3)
mean that Z
is a new distribution whose values are sums of independent samples from Normal(1,1)
and Normal(2,3)
? Or does it mean that Z
is a 50:50 mixture of those two normals? The way I'm reading this, it would be the former, which is sensible.
The +
notation means that each draw of Z is a sum of two other RV's, not a randomly selected one of the two. We have another set of functions for constructing mixtures, which I'm hoping to revise soon.
Cool. This is shaping up to be one of the sweetest Julia packages.
Hi, just wondering if there's any update on this. I was thinking about how to implement a product distribution, but my intuition was that a product should be in the sense of product measure, so the dimensionalities add, maybe by way of a tuple.
The idea of adding or multiplying distributions as a convolution only works if you can guarantee independent of the components, and I've seen this trip people up a few times. Also, there are only a few stable distributions; in most cases the sum or product can't be represented in a convenient closed form.
I'm afraid that is not the case. It is hard to know how well it could work in practice it would be great if you could do some experiments and report back.
I'm getting caught up a little bit by the types, still getting used to some fine points of Julia. Rather than having everything be abstract, I was considering having a parameterized CustomDistribution
type, with every method specified as a component of the defining struct
. Seems like that could help with type stability. Then maybe a @productDistribution
macro could assemble things for individual cases. I'll see how it goes.
Hmm, it might be even simpler than this. It works just fine to build a product as a tuple:
entropy(d :: Tuple{Distribution, Distribution, Distribution}) = sum(entropy.(d))
mean(d :: Tuple{Distribution, Distribution, Distribution}) = mean.(d)
Then we can write
Main> d = (Normal(), Uniform(0,1), Poisson(1.3))
(Distributions.Normal{Float64}(μ=0.0, σ=1.0), Distributions.Uniform{Float64}(a=0.0, b=1.0), Distributions.Poisson{Float64}(λ=1.3))
Main> mean(d)
(0.0, 0.5, 1.3)
Main> entropy(d)
2.8753527708838638
I'm not sure yet about overhead; maybe the types are too generic.
Some progress has been made in the way of a convolution framework: see pull request #919. I think we could have the ∗
(\ast
) operator as our convolution operator.
The main downside is that it looks a lot like *
(multiplication).
I took ˋ⊕ˋ as symbol (for the independent sum of random variables) in https://github.com/mschauer/GaussianDistributions.jl/blob/master/src/GaussianDistributions.jl .
Yeah, I like ˋ⊕ˋ as an alternative. Either way, I think this should be part of this library. Does this need to wait until #880?
Is there an update on weighting mixture models? Currently:
MixtureModel(Normal[ Normal(2, 3), Normal(3, 4) ])
is just a 50/50 mixture with no way of specifying the weights.
Or perhaps there is a work around for this if I want to specify a weighted mixture in a model
Something like ...
x~MixtureModel(Normal[
Normal(2, 3),
Normal(3, 4)
], weights=[0.2, 0.4])
that would be equivalent to:
pdf(Normal(2, 3), x)*0.2.+pdf(Normal(3, 4), x)*0.4
Thank you,
You can specify the mixture weights as an additional argument. Some examples are shown in https://juliastats.org/Distributions.jl/stable/mixture/#Constructors.
Thank you for the prompt response. Though the constructor requires the prior probabilities to add up to 1. What if the weights do not necessarily add up to 1?
I suppose we can just normalize the weights so that they add up to 1 and multiply by the scaling factor. It would be great if the mixture model can do this by default rather than requiring prior probabilities.
Hi, I'm not sure I understood the bottom line. Given a vector of univariate distributions X_1, ..., X_n (or an equivalent multivariate distribution) and a function f: R^n -> R , is there a method to create a new univariate distribution f(X_1, ..., X_n) ?