msmbuilder-legacy [WIP] MCMC sampling of rates and Tprobs

This is to give people a feel for the interface. At this point, mcmc.TransitionSampler and mcmc.RateSampler are working.

Apr 26 '13 19:04 kyleabeauchamp

Here is the driver code:

import pymc
import numpy as np
from msmbuilder import mcmc

num_states = 3
alpha = 1

C = np.ones((num_states,num_states))
C[0,0] = 1000

sampler = mcmc.TransitionSampler(C)
sampler.sample(50000,thin=20)

Ti = []
for T in sampler.iterate_T():
    Ti.append(T)

Ti = np.array(Ti)
T = Ti.mean(0)

sampler = mcmc.RateSampler(C)
sampler.sample(50000,thin=20)

Ki = []
for K in sampler.iterate_K():
    Ki.append(K)

Ki = np.array(Ki)
K = Ki.mean(0)

Apr 26 '13 19:04 kyleabeauchamp

The reason I use the iterate_K() function is that we have to handle two cases:

The samples all fit in memory.
The samples are too big for in-memory

Apr 26 '13 19:04 kyleabeauchamp

I'll add more comments and docstrings later. It might be useful for people to just ask questions about what isn't intuitive about the PyMC interface.

Apr 26 '13 19:04 kyleabeauchamp

Looks cool. What is the 'thin' kwarg supposed to do?

Will this only work for dense matrices? Do you think this is going to be possible for a 10,000 state model?

Apr 26 '13 19:04 schwancr

thin is how much to subsample the MCMC samples to reduce correlation.

The current implementation will only work for dense models. I have ideas about how to scale things up better, but it's definetely going to be a challenge. I think there are open questions about what priors to use etc.

For example, transition matrices are never sparse, so we shouldn't really be assuming sparsity during estimation. This leads to huge problem with 10,000 state models.

I think things might be better for rate matrices, but then we have the issue of matrix exponentiation.

Apr 26 '13 19:04 kyleabeauchamp

Also: the current code uses an Uninformative prior, not a Dirichlet prior. I'm still figuring out the best way to do a Dirichlet prior. The issue is that I need to "stack" a bunch of Dirichlet random variables, but PyMC doesn't have a very elegant way to do this.

I think I might end up just using an Uninformative prior and throwing the prior on as an external "potential".

Apr 26 '13 19:04 kyleabeauchamp

Speed is definitely an issue here--these chains are super slow to converge...

Apr 26 '13 22:04 kyleabeauchamp

How slow do you mean?

Waiting a day, probably isn't a big deal. Can you make a parallel version?

Apr 26 '13 22:04 schwancr

It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none...

Apr 26 '13 22:04 kyleabeauchamp

Agreed.

-Robert Sent from my iPhone.

On Fri, Apr 26, 2013 at 3:31 PM, kyleabeauchamp [email protected] wrote:

It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none...

Reply to this email directly or view it on GitHub: https://github.com/SimTk/msmbuilder/pull/187#issuecomment-17103731

Apr 26 '13 22:04 rmcgibbo

So for estimating rates, it might be natural to use an exponential distribution as a prior. This should place maximum prior weight at Kij = 0, inducing sparsity in the rate matrix.

We would then have a single hyperparameter, $\lambda$, that would be shared for all the rates. We could estimate $\lambda$ by cross validation.

I can't remember if I've ever seen this in the literature, but it seems like the right thing to do.

Apr 27 '13 04:04 kyleabeauchamp

I just checked in a revised codebase that appears to be working. Here is a driver program that checks all 4 types of model:

import pymc
import numpy as np
from msmbuilder import mcmc, MSMLib
import scipy.linalg

num_states = 2
alpha = 1

C = 100 * np.ones((num_states,num_states))
C[0,0] *= 1000
C[0,1] += 1

T0 = MSMLib.estimate_transition_matrix(C)
X0 = MSMLib.mle_reversible_count_matrix(scipy.sparse.csr_matrix(C)).toarray()
T0_rev = MSMLib.estimate_transition_matrix(X0)

sampler = mcmc.TransitionSampler(C)
sampler.sample(1000000,thin=500)
T = sampler.mcmc.trace("T")[:].mean(0)
T_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.RateSampler(C)
sampler.sample(1000000,thin=500)
T_K = sampler.mcmc.trace("T")[:].mean(0)
T_K_sigma = sampler.mcmc.trace("T")[:].std(0)


sampler = mcmc.ReversibleTransitionSampler(C)
sampler.sample(1000000,thin=500)
T_rev = sampler.mcmc.trace("T")[:].mean(0)
T_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.ReversibleRateSampler(C)
sampler.sample(1000000,thin=500)
T_K_rev = sampler.mcmc.trace("T")[:].mean(0)
T_K_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

So I'm finding some differences between the different models. The differences are most obvious in the few-count regime, as we would expect. Thus, we have an interesting question: what is the right answer? I suspect there might not be a "right" answer.

Apr 29 '13 18:04 kyleabeauchamp

Hm, there might still be something funny with the reversible rate sampler. I'm looking into it.

Apr 30 '13 18:04 kyleabeauchamp

I think I fixed the issue with the ReversibleRateSampler. I needed to discard the early samples (burn-in). I added this feature and now I think things are working much better.

I also added tests, which are disabled by default due to slowness.

May 06 '13 18:05 kyleabeauchamp

Here is an updated driver program:

import numpy as np
import scipy.sparse
from msmbuilder import mcmc, MSMLib

num_states = 2
alpha = 1

num_steps = 1000000
thin = 500
burn = 50000

C = 1000000 * np.ones((num_states,num_states))
C[0,0] *= 1000
C[0,1] += 1

T0 = MSMLib.estimate_transition_matrix(C)
X0 = MSMLib.mle_reversible_count_matrix(scipy.sparse.csr_matrix(C)).toarray()
T0_rev = MSMLib.estimate_transition_matrix(X0)

sampler = mcmc.TransitionSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T = sampler.mcmc.trace("T")[:].mean(0)
T_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.RateSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_K = sampler.mcmc.trace("T")[:].mean(0)
T_K_sigma = sampler.mcmc.trace("T")[:].std(0)


sampler = mcmc.ReversibleTransitionSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_rev = sampler.mcmc.trace("T")[:].mean(0)
T_rev_sigma = sampler.mcmc.trace("T")[:].std(0)

sampler = mcmc.ReversibleRateSampler(C)
sampler.sample(num_steps,thin=thin, burn=burn)
T_K_rev = sampler.mcmc.trace("T")[:].mean(0)
T_K_rev_sigma = sampler.mcmc.trace("T")[:].std(0)


T0 - T
T0 - T_K
T0_rev - T_rev
T0_rev - T_K_rev

May 06 '13 18:05 kyleabeauchamp

I think this is almost ready, so people can comment on what else they want to see.

May 06 '13 18:05 kyleabeauchamp

Here are some things that I didn't implement:

Estimation with a fixed value of equilibrium populations. Sparse estimation. Estimation with a fixed sparsity structure.

These are considerably more difficult, but do-able. We can implement them as people find it necessary.

May 07 '13 05:05 kyleabeauchamp

So I have been experimenting with NumExpr for speeding up likelihood calculations (this is for my ensemble stuff, but it would also apply here and to MLE code in MSMB). For expressions involving vectorized logarithms and algebra, it's trivial to get a ~3X speedup by just throwing in something like

May 14 '13 20:05 kyleabeauchamp

Sweet. numpexpr is already a dependency, via tables, so this is all upside.

-Robert

On May 14, 2013, at 1:44 PM, kyleabeauchamp wrote:

So I have been experimenting with NumExpr for speeding up likelihood calculations (this is for my ensemble stuff, but it would also apply here and to MLE code in MSMB). For expressions involving vectorized logarithms and algebra, it's trivial to get a ~3X speedup by just throwing in something like
—
Reply to this email directly or view it on GitHub.

May 14 '13 21:05 rmcgibbo

@kyleabeauchamp, is this identical to what's proposed in J.C.P 128, 244103 (2008). If not, what are the differences?

May 28 '13 03:05 rmcgibbo

Similar, but not identical. I think pymc uses normally distributed proposal steps, with a proposal scale that is tuned to optimize the acceptance rate.

In Noe's paper, he analytically calculates a bunch of proposal distributions.

For a model that is on the boundary of the feasible space, I bet Frank's stuff works better. For models in the interior, I bet ours might be better. I'm not sure--I'm not an expert.

I think the real value in my method is simplicity. All I had to do was make find a parameterization with the correct restraints (detailed balance, normalization, rate normalization, etc), but pymc does the rest.

We could possibly get performance enhancements by using a Gibbs sampling type approach, where we manually work out how we propose samples. That would be more work...

May 28 '13 03:05 kyleabeauchamp

But it's sampling the same distributions. The difference, you're saying, is just efficiency?

May 28 '13 03:05 rmcgibbo

Well, there's also the question of prior.

For a general transition matrix, the dirichlet prior is pretty standard. I don't think it's as clear-cut how to parameterize the prior for constrained models or for rate models.

May 28 '13 03:05 kyleabeauchamp

Fair.

On May 27, 2013, at 8:34 PM, kyleabeauchamp wrote:

Well, there's also the question of prior.

For a general transition matrix, the dirichlet prior is pretty standard. I don't think it's as clear-cut how to parameterize the prior for constrained models or for rate models.

— Reply to this email directly or view it on GitHub.

May 28 '13 03:05 rmcgibbo

I think there's a lot that could be done to generalize and accelerate this code, but I think it might be wise to wait to see how we plan to build models in the future. In particular, large sparse models are going to be a pain in the ass. If we can avoid them using some sort of TICA trick, then we can really use MCMC throughout the pipeline.

May 28 '13 03:05 kyleabeauchamp

Yeah, I'm okay with slow code really. I think we want flexibility and generality first. We can deal with speed later.

-Robert

On May 27, 2013, at 8:44 PM, kyleabeauchamp wrote:

I think there's a lot that could be done to generalize and accelerate this code, but I think it might be wise to wait to see how we plan to build models in the future. In particular, large sparse models are going to be a pain in the ass. If we can avoid them using some sort of TICA trick, then we can really use MCMC throughout the pipeline.

— Reply to this email directly or view it on GitHub.

May 28 '13 03:05 rmcgibbo

Regarding speed, there are several things that might be rate limilting, depending on what regime we work in:

Large numbers of pymc Potential function calls (fix with Cython?)
Large number of exp() calls with small inputs (fix with cython)
exp() calls with large inputs (fix with numexpr)
Dense or sparse matrix math.

May 28 '13 03:05 kyleabeauchamp

OK so it appears that I will likely need to rewrite this code to better handle NANs from nonzero counts.

Oct 18 '13 21:10 kyleabeauchamp

zero counts, rather.

Oct 18 '13 21:10 kyleabeauchamp

Argh. So I think there's a bug in pymc that won't let you initialize a Dirichlet RV at a boundary point. If you have alpha=1 (e.g. uniform), then that point is perfectly valid. The point is that we therefore cannot use the unregularized MLE as the starting point for MCMC.

Oct 18 '13 22:10 kyleabeauchamp

msmbuilder-legacy msmbuilder-legacy copied to clipboard

[WIP] MCMC sampling of rates and Tprobs

It seems like we need ~1-2 minutes to converge for small matrices. I think the code is still worth having, as it's simple and easy to understand. Some error analysis is better than none...

msmbuilder-legacy
msmbuilder-legacy copied to clipboard