MeasureTheory.jl
MeasureTheory.jl copied to clipboard
analytical form for convolution and bind of two Normal distributions
Hi Chad,
I am following up here on our discussion from Zulip. I followed your advice therein. Let me explain a couple more points of this PR:
-
It defines
convolveas a new combinator. The reason is that for this problem, the Normal->Normal bind is really just the convolution of two normals. -
In Zulip you suggested using
mean(d), andstd(d)to make the code more generic. Currently these would throw a method error, so I had to add some code for the calculation ofmeanandstdforAffine(ly transformed) measures. -
Compared to the Zulip discussion, I also added code to dispatch when σ (of the Normal kernel) is fixed, but possibly unequal to 1. This relates to a conversation we had a while ago about constraints in kernels/likelihoods.
One possible concern (though not a concern for my use case) is the following:
julia> Normal(0,1) ↣ MeasureTheory.kernel(Normal{(:μ,)}, μ=identity)
Normal(μ = 0.0, σ = 1.41421)
julia> Normal(0,1) ↣ MeasureTheory.kernel(Normal{(:μ,)})
MeasureBase.Bind{Normal{(:μ, :σ), Tuple{Int64, Int64}}, ParameterizedTransitionKernel{Type{Normal}, ComposedFunction{typeof(params), MeasureBase.var"#f#8"{Normal, (:μ,)}}, (:μ,), Tuple{MeasureBase.var"#54#56"{Int64}}}}(
Normal(μ = 0, σ = 1),
ParameterizedTransitionKernel(Normal, MeasureBase.params ∘ MeasureBase.var"#f#8"{Normal, (:μ,)}(), (μ = MeasureBase.var"#54#56"{Int64}(1),)))
Let me know what you think!
Codecov Report
Merging #226 (06b8933) into master (6a42ef6) will increase coverage by
0.37%. The diff coverage is66.66%.
@@ Coverage Diff @@
## master #226 +/- ##
==========================================
+ Coverage 44.60% 44.98% +0.37%
==========================================
Files 42 44 +2
Lines 1307 1325 +18
==========================================
+ Hits 583 596 +13
- Misses 724 729 +5
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/MeasureTheory.jl | 50.00% <ø> (ø) |
|
| src/combinators/convolve.jl | 0.00% <0.00%> (ø) |
|
| src/combinators/affine.jl | 53.88% <85.71%> (+1.28%) |
:arrow_up: |
| src/parameterized/pairwise/normal_normal.jl | 100.00% <100.00%> (ø) |
|
| src/distproxy.jl | 50.00% <0.00%> (+16.66%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Thanks @nignatiadis !
I see how the result in the normal case is equivalent to a convolution, but more generally I think bind builds a compound distribution. So for us, maybe a compound measure?
There's a nice overview here. Eventually (though not necessarily this PR, unless you want to) we should have all of the examples in that article as unit tests :)
Yes exactly! A lot of my research is about such compound distributions. I plan to add a few more to this PR (though not all in the Wikipedia list).
Also, I had never heard the term bind used before for this operation, but then noticed that bind in the package had exactly the semantics I was interested in. Another way to think about this is as a composition operation on Markov kernels (see e.g. the following, as well as the section just above it).
In my papers I typically just call it the marginal distribution (where marginal refers to the fact, that we start with a joint distribution on (X,Y), with samples generated as: X~μ, Y~k(X), for a measure μ and a kernel k) but then are only interested in the marginal of Y. [Calling this a marginal distribution in this package would cause ambiguity.]
Some background on bind...
In functional programming, a monad is a type constructor m that has two operations:
unit :: a -> m a
bind :: m a -> (a -> m b) -> m b
bind is usually written as >>=, though that's awkward to write and Julia doesn't allow it. For a quick workaround I used ↣, though I'm not sure we should stay with that.
Monads are also required to satisfy the "monad laws":
unit(x) >>= f == f(x)
m >>= unit == m
m >>= (x -> (f(x) >>= g)) == (m >>= f) >>= g
For us,
mis $\mathcal{M}$, where $\mathcal{M}(X)$ is "measures on $X$" (mathematically -- we don't have this notation in code)unitisDiracbindis... well, curently it'sbind, but maybe it should becompoundor something more measure-specific?
This package also aims at a very similar task: https://github.com/filtron/MarkovKernels.jl
Good find! That looks very useful
@filtron
@filtron
Hello. Can I be of assistance somehow?
I think you can be witness to us appreciating your package and possibly figure out if there is some problem we share
Pardon my late reply. I am quite new to Julia so encouragement + feedback is much appreciated. Regarding discussing issues we might share, I feel like I am missing some context. We are both interested in convolving Normal distributions but it is not clear to me what the issue is beyond that?
@filtron: Thank you for your response! I was very excited to see your MarkovKernels.jl package; thanks for writing it!
Here's the reason I initially tagged your package in this thread: MeasureTheory.jl and associated packages (e.g., MeasureBase.jl) seek to provide abstractions to work with measures and reason about them in a rigorous way in Julia, see e.g., the following article. One abstraction in these packages is that of Markov kernels and likelihoods, but part of that abstraction is still not set in stone 100% (I think; Chad and Moritz please correct me). Your package has these abstractions as well, along with very useful functionality, and it would be awesome if it could be provided as part of MeasureTheory.jl or using the MeasureTheory.jl interface (which perhaps would require some changes to satisfy requirements underlying your package; a discussion here could be fruitful).
I would be happy to help get any desired functionality from MarkovKernels.jl into MeasureTheory.jl. I suppose concretely what we are talking about is the concepts I call compose, marginalise, and invert for, by the looks of it, normal(kernels) in cholesky parametrization?
If you want, have a look https://arxiv.org/abs/2010.03509. We also end up with such primitives which describe Bayesian inference in tree models (you can formulate a state space model this way too):
Pullbacks of likelihoods
$$ h_k (x) = \int k(y\mid x) h(y) \mathrm{d} x,\quad (1) $$
for example the fusion step of the Kalman-filter is a form of step (1).
With that you add tilting kernels with likelihoods
$$ k^h(y \mid x) = \frac{k(y\mid x) h(y)}{h_k(x)}, \quad (2) $$
and marginalisation
$$ k^h (y) = \int k^h(y\mid x) \pi(x) \mathrm{d} x, \quad (3) $$
Together (1), (2) and (3) allow you to describe an RTS type smoother made of a backward filter with fusion step (1) and a forward smoother (2).
I have been meaning to get back to you on this. Please don't mistake my tardiness for disinterest. Doing inference on trees by means of the h-transform is indeed an interesting development. However, not to come off as pedantic, but in my mind there are three ways of doing inference in partially observed Markov processes namely,
- Filter forward, predict backwards (i.e. RTS)
- Filter forward and "filter" backwards, correct forward filter with backward filter (i.e. two-filter smoother)
- "Filter" backward, predict forwards (i.e. Doob's h-transform)
Formally, these alternatives produce the same output. However, algorithmically they are quite different to me and offer different avenues for developing approximate inference methods. The above looks more like alternative 3 to me (I guess 1 & 2 would also have tree analogues).
In any case, I would like to re-iterate that I would be pleased to collaborate.