efax icon indicating copy to clipboard operation
efax copied to clipboard

feature request: add support for accessing log normalizer

Open murphyk opened this issue 1 year ago • 9 comments

Computing the log parition function for a conjugate family is a very useful quantity eg for computing the marginal likelihood p(D), which is needed for empirical Bayes and model selection. (See screenshot from my book below which shows p(D) = Z(post)/Z(prior))

Also it would be nice to have a worked example of a likelihood+prior=posterior computation for some simple familiar families, like binomial+beta=beta, or gauss+gauss=gauss.

Screenshot 2024-11-13 at 2 03 53 PM

murphyk avatar Nov 13 '24 13:11 murphyk

Nice to hear from you again, Kevin!

Computing the log parition function for a conjugate family is a very useful quantity

I agree. That's NaturalParametrization.log_normalizer in efax. All distributions implement that.

for computing the marginal likelihood p(D)

Ah, that's a cool use that I hadn't considered! Could we provide some functions to make this calculation more convenient? It looks like the quotient of log-normalizers (prior and likelihood) times the carrier measure of the data?

Also it would be nice to have a worked example of a likelihood+prior=posterior computation for some simple familiar families, like binomial+beta=beta, or gauss+gauss=gauss.

That's a good idea. I'd like to add something like that this week. Essentially, these examples (done for Gaussian, see below) should take the form:

  • Convert the prior and likelihood to natural parameters prior and likelihood
  • Add them posterior = efax.parameter_map(operator.add, prior, likelihood)
  • Convert them back to whatever source parametrization you want.

Do you think that would add clarity? If you have an idea, pull requests are always welcome!

NeilGirdhar avatar Nov 18 '24 10:11 NeilGirdhar

yes, sounds great. You could even have a unit test like reproducing some of the demos below

https://github.com/probml/pyprobml/blob/master/notebooks/book1/04/beta_binom_post_plot.ipynb https://github.com/probml/pyprobml/blob/master/notebooks/book1/04/beta_binom_post_pred_plot.ipynb https://github.com/probml/pyprobml/blob/master/notebooks/book2/03/gauss_seq_update_sigma_1d.ipynb https://github.com/probml/pyprobml/blob/master/notebooks/book1/03/gauss_infer_2d.ipynb

[image: Screenshot 2024-11-18 at 10.25.44 AM.png]

On Mon, Nov 18, 2024 at 10:05 AM Neil Girdhar @.***> wrote:

Nice to hear from you again, Kevin!

Computing the log parition function

I agree. That's NaturalParametrization.log_normalizer in efax. All distributions implement that.

for computing the marginal likelihood p(D)

Ah, that's a cool use that I hadn't considered! Could I provide some functions to make this calculation more convenient? It looks like the quotient of log-normalizers (prior and likelihood) times the carrier measure of the data?

Also it would be nice to have a worked example of a likelihood+prior=posterior computation for some simple familiar families, like binomial+beta=beta, or gauss+gauss=gauss.

That's a good idea. I'd like to add something like that this week. Essentially, these examples should take the form:

  • Convert the prior and likelihood to natural parameters prior and likelihood
  • Add them posterior = efax.parameter_map(operator.add, prior, likelihood)
  • Convert them back to whatever source parametrization you want.

Do you think that would add clarity?

— Reply to this email directly, view it on GitHub https://github.com/NeilGirdhar/efax/issues/29#issuecomment-2482505353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDK6EBFYOOVSUUO26BC3CD2BG3XFAVCNFSM6AAAAABRWOPVUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBSGUYDKMZVGM . You are receiving this because you authored the thread.Message ID: @.***>

murphyk avatar Nov 18 '24 10:11 murphyk

reproducing some of the demos below

Cool! I think the closest thing I have is this test. It samples from scipy, and then does maximum likelihood estimation (which is essentially prior, likelihood combination in the conjugate prior). What do you think?

I'll take a closer look at your examples this week.

NeilGirdhar avatar Nov 18 '24 10:11 NeilGirdhar

Seems that getting the conjugate prior distribution from data (likelihood) and add it to an existing prior should work, maybe define a function for doing this?

zinccat avatar Nov 26 '24 21:11 zinccat

A simple example for Bernoulli, it would be great if some of the methods can be designed to make this smoother, e.g., I don't quite get the sufficient_statistics method


import jax.numpy as jnp
from jax.random import PRNGKey

import efax
from efax import BernoulliNP, BernoulliEP, BetaNP, parameter_mean


# beta prior
prior = BetaNP(alpha_minus_one=jnp.array([1.0, 1.0]))

# bernoulli likelihood
n = (100,)
dist = BernoulliEP(probability=jnp.array([0.4]))
samples = dist.sample(PRNGKey(0), n)
ss = BernoulliNP.sufficient_statistics(samples)
ss_mean = parameter_mean(ss, axis=0)
print(ss_mean)
likelihood = ss_mean.conjugate_prior_distribution(jnp.array(n))

print("prior", prior)
print("likelihood", likelihood)

# posterior
posterior = efax.parameter_map(jnp.add, prior, likelihood)

print("posterior", posterior)

zinccat avatar Nov 26 '24 23:11 zinccat

Seems that getting the conjugate prior distribution from data (likelihood) and add it to an existing prior should work, maybe define a function for doing this?

I think I know what you're getting at, but let's make sure we're on the same page. There are two major ways of combining evidence:

  • (1) If you're just combining evidence (in the pointwise product of densities sense), you simply convert your distributions ($X_i$) to their natural parametrization and add their parameters. I just made an example to illustrate that.
  • On the other hand, suppose that you have a sensor that his recorded readings of some true value (e.g., temperature), and each reading is a value $x_i$. If we would like to know the maximum likelihood distribution over the true value given all of the readings, then we should convert the readings to their sufficient statistic, and take the expected value of the sufficient statistics. This is illustrated in this example.
  • (2) However, if instead of values, the readings were distributions $X_i$ of the true value, we should convert the distributions to their expectation parameters and take the expected value. This is shown in section 1.3.3, but there is no example. I'm happy to add one if either of you thinks it would be helpful.

Does this all make sense? What do you think?

A simple example for Bernoulli, it would be great if some of the methods can be designed to make this smoother, e.g., I don't quite get the sufficient_statistics method

I think your example looks perfect to me. Every part of it shows a clear intent with code that corresponds to your intent. I think it could be possible to wrap this up in a convenience function. Why don't we try coding that up, and then think about whether it belongs in your code or in efax?

NeilGirdhar avatar Nov 27 '24 09:11 NeilGirdhar

I think the examples make sense, and I guess in a real scenario we'll have a prior and some observations? Which will require us to get likelihood and add it to the prior

zinccat avatar Nov 28 '24 04:11 zinccat

I don't quite understand the current design of sufficient_statistics, seems that for beta ss is just saving the samples?

zinccat avatar Nov 28 '24 04:11 zinccat

I don't quite understand the current design of sufficient_statistics, seems that for beta ss is just saving the samples?

Did you try reading expfam.pdf?

NeilGirdhar avatar Nov 28 '24 07:11 NeilGirdhar