MeasureTheory.jl icon indicating copy to clipboard operation
MeasureTheory.jl copied to clipboard

analytical form for convolution and bind of two Normal distributions

Open nignatiadis opened this issue 3 years ago • 17 comments

Hi Chad,

I am following up here on our discussion from Zulip. I followed your advice therein. Let me explain a couple more points of this PR:

  1. It defines convolve as a new combinator. The reason is that for this problem, the Normal->Normal bind is really just the convolution of two normals.

  2. In Zulip you suggested using mean(d), and std(d) to make the code more generic. Currently these would throw a method error, so I had to add some code for the calculation of mean and std for Affine(ly transformed) measures.

  3. Compared to the Zulip discussion, I also added code to dispatch when σ (of the Normal kernel) is fixed, but possibly unequal to 1. This relates to a conversation we had a while ago about constraints in kernels/likelihoods.

One possible concern (though not a concern for my use case) is the following:

julia> Normal(0,1) ↣  MeasureTheory.kernel(Normal{(:μ,)}, μ=identity)
Normal(μ = 0.0, σ = 1.41421)

julia> Normal(0,1) ↣  MeasureTheory.kernel(Normal{(:μ,)})
MeasureBase.Bind{Normal{(:μ, :σ), Tuple{Int64, Int64}}, ParameterizedTransitionKernel{Type{Normal}, ComposedFunction{typeof(params), MeasureBase.var"#f#8"{Normal, (:μ,)}}, (:μ,), Tuple{MeasureBase.var"#54#56"{Int64}}}}(
    Normal(μ = 0, σ = 1),
    ParameterizedTransitionKernel(Normal, MeasureBase.params ∘ MeasureBase.var"#f#8"{Normal, (:μ,)}(), (μ = MeasureBase.var"#54#56"{Int64}(1),)))

Let me know what you think!

nignatiadis avatar Aug 21 '22 07:08 nignatiadis

Package name latest stable
Mitosis.jl
Soss.jl

github-actions[bot] avatar Aug 21 '22 07:08 github-actions[bot]

Codecov Report

Merging #226 (06b8933) into master (6a42ef6) will increase coverage by 0.37%. The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master     #226      +/-   ##
==========================================
+ Coverage   44.60%   44.98%   +0.37%     
==========================================
  Files          42       44       +2     
  Lines        1307     1325      +18     
==========================================
+ Hits          583      596      +13     
- Misses        724      729       +5     
Impacted Files Coverage Δ
src/MeasureTheory.jl 50.00% <ø> (ø)
src/combinators/convolve.jl 0.00% <0.00%> (ø)
src/combinators/affine.jl 53.88% <85.71%> (+1.28%) :arrow_up:
src/parameterized/pairwise/normal_normal.jl 100.00% <100.00%> (ø)
src/distproxy.jl 50.00% <0.00%> (+16.66%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Aug 21 '22 07:08 codecov[bot]

Package name latest stable
Mitosis.jl
Soss.jl

github-actions[bot] avatar Aug 21 '22 07:08 github-actions[bot]

Thanks @nignatiadis !

I see how the result in the normal case is equivalent to a convolution, but more generally I think bind builds a compound distribution. So for us, maybe a compound measure?

There's a nice overview here. Eventually (though not necessarily this PR, unless you want to) we should have all of the examples in that article as unit tests :)

cscherrer avatar Aug 21 '22 13:08 cscherrer

Yes exactly! A lot of my research is about such compound distributions. I plan to add a few more to this PR (though not all in the Wikipedia list).

nignatiadis avatar Aug 21 '22 13:08 nignatiadis

Also, I had never heard the term bind used before for this operation, but then noticed that bind in the package had exactly the semantics I was interested in. Another way to think about this is as a composition operation on Markov kernels (see e.g. the following, as well as the section just above it).

In my papers I typically just call it the marginal distribution (where marginal refers to the fact, that we start with a joint distribution on (X,Y), with samples generated as: X~μ, Y~k(X), for a measure μ and a kernel k) but then are only interested in the marginal of Y. [Calling this a marginal distribution in this package would cause ambiguity.]

nignatiadis avatar Aug 21 '22 14:08 nignatiadis

Some background on bind...

In functional programming, a monad is a type constructor m that has two operations:

unit :: a -> m a
bind :: m a -> (a -> m b) -> m b

bind is usually written as >>=, though that's awkward to write and Julia doesn't allow it. For a quick workaround I used , though I'm not sure we should stay with that.

Monads are also required to satisfy the "monad laws":

unit(x) >>= f == f(x)
m >>= unit == m
m >>= (x -> (f(x) >>= g)) == (m >>= f) >>= g

For us,

  • m is $\mathcal{M}$, where $\mathcal{M}(X)$ is "measures on $X$" (mathematically -- we don't have this notation in code)
  • unit is Dirac
  • bind is... well, curently it's bind, but maybe it should be compound or something more measure-specific?

cscherrer avatar Aug 21 '22 14:08 cscherrer

This package also aims at a very similar task: https://github.com/filtron/MarkovKernels.jl

nignatiadis avatar Nov 07 '22 17:11 nignatiadis

Good find! That looks very useful

cscherrer avatar Nov 07 '22 17:11 cscherrer

@filtron

mschauer avatar Nov 07 '22 20:11 mschauer

@filtron

Hello. Can I be of assistance somehow?

filtron avatar Nov 09 '22 11:11 filtron

I think you can be witness to us appreciating your package and possibly figure out if there is some problem we share

mschauer avatar Nov 09 '22 13:11 mschauer

Pardon my late reply. I am quite new to Julia so encouragement + feedback is much appreciated. Regarding discussing issues we might share, I feel like I am missing some context. We are both interested in convolving Normal distributions but it is not clear to me what the issue is beyond that?

filtron avatar Nov 14 '22 23:11 filtron

@filtron: Thank you for your response! I was very excited to see your MarkovKernels.jl package; thanks for writing it!

Here's the reason I initially tagged your package in this thread: MeasureTheory.jl and associated packages (e.g., MeasureBase.jl) seek to provide abstractions to work with measures and reason about them in a rigorous way in Julia, see e.g., the following article. One abstraction in these packages is that of Markov kernels and likelihoods, but part of that abstraction is still not set in stone 100% (I think; Chad and Moritz please correct me). Your package has these abstractions as well, along with very useful functionality, and it would be awesome if it could be provided as part of MeasureTheory.jl or using the MeasureTheory.jl interface (which perhaps would require some changes to satisfy requirements underlying your package; a discussion here could be fruitful).

nignatiadis avatar Nov 15 '22 00:11 nignatiadis

I would be happy to help get any desired functionality from MarkovKernels.jl into MeasureTheory.jl. I suppose concretely what we are talking about is the concepts I call compose, marginalise, and invert for, by the looks of it, normal(kernels) in cholesky parametrization?

filtron avatar Nov 17 '22 13:11 filtron

If you want, have a look https://arxiv.org/abs/2010.03509. We also end up with such primitives which describe Bayesian inference in tree models (you can formulate a state space model this way too):

Pullbacks of likelihoods

$$ h_k (x) = \int k(y\mid x) h(y) \mathrm{d} x,\quad (1) $$

for example the fusion step of the Kalman-filter is a form of step (1).

With that you add tilting kernels with likelihoods

$$ k^h(y \mid x) = \frac{k(y\mid x) h(y)}{h_k(x)}, \quad (2) $$

and marginalisation

$$ k^h (y) = \int k^h(y\mid x) \pi(x) \mathrm{d} x, \quad (3) $$

Together (1), (2) and (3) allow you to describe an RTS type smoother made of a backward filter with fusion step (1) and a forward smoother (2).

mschauer avatar Nov 17 '22 14:11 mschauer

I have been meaning to get back to you on this. Please don't mistake my tardiness for disinterest. Doing inference on trees by means of the h-transform is indeed an interesting development. However, not to come off as pedantic, but in my mind there are three ways of doing inference in partially observed Markov processes namely,

  1. Filter forward, predict backwards (i.e. RTS)
  2. Filter forward and "filter" backwards, correct forward filter with backward filter (i.e. two-filter smoother)
  3. "Filter" backward, predict forwards (i.e. Doob's h-transform)

Formally, these alternatives produce the same output. However, algorithmically they are quite different to me and offer different avenues for developing approximate inference methods. The above looks more like alternative 3 to me (I guess 1 & 2 would also have tree analogues).

In any case, I would like to re-iterate that I would be pleased to collaborate.

filtron avatar Nov 30 '22 22:11 filtron