pints
pints copied to clipboard
HierarchicalModelLogPDF
Would it be of interest to have a HierarchicalModelLogPDF class in pints?
So something that produces a joint LogPDF from individual LogPDFs with a specific dependence structure:
p(y, psi, theta) = p(y|psi)p(psi|theta)p(theta)
At the moment it is easy enough to implement the individual conditional PDFs with the existing functionality in pints I just haven't seen how to easily create sums of LogPDFs with different parameter spaces and also such with this hidden state structure.
If you can do it, then Yes please!
I think @martinjrobins and @chonlei both felt they couldn't generalise their hierarchical schemes enough?
Sum of PDFs with disjoint parameter sets should be easy enough. Somewhat analogous to 'pints.SumOfIndependentLogPDFs', but conditioning on disjoint parameter sets and and introducing some rule for the mapping, say the parameters have to in the order [psi, theta].
# Get dimensions
i = iter(self._log_likelihoods)
self._indiv_n_params = []
for e in i:
self._indiv_n_parameters.append(e.n_parameters())
self._n_parameters = np.sum(self._indiv_n_parameters)
self._indiv_n_parameters = np.cumsum([0] + self._indiv_n_parameters) # add 0 in front to make call easier
def __call__(self, x):
total = 0
for i, e in enumerate(self._log_likelihoods):
total += e(x[self._indiv_n_parameters[i]:self._indiv_n_parameters[i+1]])
return total
But having varying psi messes up the intended functionality for Likelihoods a bit in p(psi|theta).
So would it be acceptable to alter the values in the likelihood p(psi|theta), even if usually the "data" is not meant to change.
So a quick solution would be to access the private variable self._values
in the pints.problem
and update the psi values every iteration.
I'm looking at doing a hierarchical prior function, to at least add this functionality for a small subset of cases. I've written a docstring that describes what I'm intending for it to do: does it sound reasonable for a first hierarchical prior function?
In particular @ben18785 the hierarchical guru
Looks really good Simon -- thanks!
On Wed, Jun 3, 2020 at 2:49 PM Simon Marchant [email protected] wrote:
I'm looking at doing a hierarchical prior function, to at least add this functionality for a small subset of cases. I've written a docstring that describes what I'm intending for it to do: does it sound reasonable for a first hierarchical prior function? [image: image] https://user-images.githubusercontent.com/55392151/83644381-252d4000-a5a9-11ea-90c3-9e9a5283b837.png In particular @ben18785 https://github.com/ben18785 the hierarchical guru
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1134#issuecomment-638210000, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKHIM5AXINXJPTKNM4LRUZIFHANCNFSM4M6U3VHQ .
@MichaelClerx @ben18785 @martinjrobins @chonlei
Additionally to the really cool HierarchicalLogPrior that @simonmarchant is implementing, I've been thinking about how one could compose hierarchical posteriors that are not constrained to Normally distributed bottom-level parameters, and more importantly where you have the chance to pool some of the bottom-level parameters. In my problems you often want some parameters like the noise to be shared between individuals, while others may vary. I couldn't really workout how pymc3 may be integrated with pints in order to do this.
So here is my suggestion (sorry for the unrendered teX, but maybe just look at the example at the bottom to get a feel for the interface):
class HierarchicalLogPosterior(LogPDF):
r"""
Represents the log-posterior of an hierarchical model
.. math::
\log \mathbb{P}(\theta ,\psi ^{(1)}, \ldots ,\psi ^{(n)} |
D^{(1)}, \ldots ,D^{(n)}) =
\sum _{i=1}^n \log \mathbb{P}(y | \psi ^{(i)})
\Big | _{y = D^{(i)}} +
\sum _{i=1}^n \mathbb{P}(\psi ^{(i)} | \theta) + \mathbb{P}(\theta ) +
\text{constant},
where :math:`y` is the observable output of the hierarchical model whose
distribution is parametrised by :math:`\psi`. The top level parameters
:math:`\theta` determine the distribution of :math:`\psi`. We refer to
:math:`\sum _{i=1}^n\log\mathbb{P}(y|\psi ^{(i)})\Big |_{y=D^{(i)}}`
as the bottom-level log-likelihood, and to
:math:`\sum _{i=1}^n \mathbb{P}(\psi _i | \theta)` as the top-level
log-likelihood.
Hierarchical modelling of :math:`n` data sets :math:`D^{(1)}\ldots,D^{(n)}`
is the compromise between a pooled analysis, in which one set of model
parameters :math:`\psi` is assumed to be suitable for all observations, and
an unpooled analysis where an independent set of parameters
:math:`\psi ^{(i)}` is inferred for each data set :math:`D^{(i)}`. In
hierarchical models the :math:`\psi ^{(i)}` are assumed to
follow a top level distribution parameterised by parameters :math:`\theta`.
The :class:`HierarchicalLogPosterior` takes a list of bottom-level
log-likelihoods of length :math:`n`, a list of top-level likelihoods of
length :math:`b`, and a list of priors of length :math:`t`. The
bottom-level log-likelihoods represent the likelihoods for the
parameters :math:`\psi ^{(i)}` for each data set :math:`D^{(i)}`, where
each parameter set is of size :math:`b`. The top-level log-likelihoods
represent the likelihoods of the parameters :math:`\theta` for a given set
of bottom parameters :math:`\psi ^{(1)}, \ldots , \psi ^{(n)}`. For each of
the :math:`t` top-level parameters :math:`\theta` is a prior passed to the
hierarchical log-posterior class.
The log-posterior can be evaluated by passing an array-like object with
parameters (:math:`\theta, \psi ^{(1)}, \ldots , \psi ^{(n)}`) of length
:math:`t + nb`. Parameters are sorted such that top-level parameters come
before bottom-level parameters. The parameter order within the
log-likelihoods is preserved, and parameters from different
log-likelihoods are appended.
Optionally, a subset of the parameters :math:`\psi` can be chosen to be
pooled by passing a boolean array-like object ``non_hierarchical_params``
of length :math:`b` to the HierarchicalLogPosterior. ``True`` indicates
pooling. With :math:`p` pooled parameters the length of the parameters
reduces to :math:`t + p + n(b-p)`. Pooled parameters count as top-level
parameters, and therefore come before the remaing bottom-level parameters.
Extends :class:`LogPDF`
Parameters
----------
bottom_log_likelihood : List of :class:`LogPDF` of length :math:`n`
A list of log-likelihoods of the parameters :math:`\psi^{(i)}` for each
data set :math:`D^{(1)}, \dots, D^{(n)}`. All log-likelihoods in the
list live on the same :math:`b`-dimensional parameter space.
top_log_likelihood : List of :class:`LogPrior` of length :math:`b`
A list of log-likelihoods of the parameters :math:`\theta` for a given
set of bottom-level parameters
:math:`\psi ^{(1)}, \ldots , \psi ^{(n)}`. If :math:`p` of the
:math:`b` bottom-level parameters are pooled, only :math:`b-p`
top-level likelihoods can be passed.
log_prior : List of :class:`LogPrior` of length :math:`t`
A list of log-priors for the top-level parameters :math:`\theta`. If
:math:`p` of the bottom-level parameters are pooled, :math:`t+p`
log-priors need to be passed to the HierarchicalPosterior.
non_hierarchical_params : Array-like boolean of length :math:`b`
Mask which determines which of the :math:`b` parameters :math:`\psi`
may be pooled.
Example
-------
::
# Three schools problem: Assume data for the different schools has
# already been integrated into three pints.SingleOutputProblem with
# one parameter each for the mean score of the school.
#
# Bottom-level model is Normal distribution, where for illustration we
# choose to pool the variance parameter. Top-level model is also a
# Normal distribution.
# Bottom-level log-likelihoods
bottom_log_likelihood_school_one = pints.GaussianLogLikelihood(
problem_school_one)
bottom_log_likelihood_school_two = pints.GaussianLogLikelihood(
problem_school_two)
bottom_log_likelihood_school_three = pints.GaussianLogLikelihood(
problem_school_three)
# Top-level log-likelihoods
top_log_likelihood = pints.GaussianLogPrior(
mean=0, sd=1) # Initial values have no effect
# Log-priors
bottom_variance_log_prior = pints.HalfCauchyLogPrior(
location=0, scale=5)
top_mean_log_prior = pints.NormalLogPrior(
mean=0, standatd_deviation=5)
top_variance_log_prior = pints.HalfCauchyLogPrior(
location=0, scale=5)
# Hierarchical log-posterior
log_posterior = pints.HierarchicalLogPosterior(
bottom_log_likelihood=[
bottom_log_likelihood_school_one,
bottom_log_likelihood_school_two,
bottom_log_likelihood_school_three],
top_log_likelihood=[
top_log_likelihood]
log_prior=[
bottom_variance_log_prior,
top_mean_log_prior,
top_variance_log_prior]
non_hierarchical_params=[
False, True]) # Bottom-mean: hierarchical, -variance: pooled
# Evaluation of hierarchical log-prior
mock_top_mean = 2.5
mock_top_variance = 4
mock_bottom_variance = 1
mock_mean_school_one = 1
mock_mean_school_two = 2
mock_mean_school_three = 3
value = log_posterior([
mock_top_mean,
mock_top_variance,
mock_bottom_variance,
mock_mean_school_one,
mock_mean_school_two,
mock_mean_school_three])
"""
I'm grateful for any suggestions.
I realised that for gradient based sampling or optimisation, we would need to be able to compute the partials of the top-level likelihoods with respect to both, it's inputs \theta but also with respect to \psi.
If we used LogPriors for the top-level likelihoods, the priors would need to get an additional method which would return the partials with respect to the hyperparameters.