AdvancedMH.jl icon indicating copy to clipboard operation
AdvancedMH.jl copied to clipboard

Adaptive Metropolis algorithm

Open kaandocal opened this issue 3 years ago • 30 comments

I've implemented a simple version of the standard AM algorithm as a small extension to #39 (this PR is independent of #39). The pull request features a multivariate Gaussian proposal which uses a rescaled version of the posterior covariance (Haario et al.'s famous 2.38 rule) + a small epsilon term for positive-definiteness and initial exploration. In order to update the proposal I had to create a new sampler class, AMSampler, which is basically MetropolisHastings but updates the proposal at each sampling step via the trackstep function. The default implementation of trackstep does nothing.

The update mechanism is slightly different from that in #39: this one performs adaptation in AbstractMCMC.step whereas #39 does it in propose. I think the new AMSampler class should be able to handle both, and since it consists of minimal modifications to the original MetropolisHastings it could provide a general interface for adaptive MH algorithms.

Notes:

  • Proposals are shared between threads - how can this be changed?
  • I haven't implemented all overloads of propose for AMSampler but this should not be an issue.
  • This PR would benefit from #38 to make the trackstep interface more uniform.
  • The test example is minimal, I would welcome any suggestions!

kaandocal avatar May 26 '21 10:05 kaandocal

Is it possible to just add a new proposal and extend the existing sampler, if needed, but not add a new sampler? I think this would simplify the code and also be easier if one wants to combine different proposals.

I am also wondering if trackstep is needed at all - maybe one could just update the proposal right before proposing a new sample in the propose step?

devmotion avatar May 26 '21 11:05 devmotion

Is it possible to just add a new proposal and extend the existing sampler, if needed, but not add a new sampler? I think this would simplify the code and also be easier if one wants to combine different proposals.

I didn't want to modify existing code in order to not break something but the changes could be trivially implemented in MetropolisHastings! I can move things there if that'd be better.

I am also wondering if trackstep is needed at all - maybe one could just update the proposal right before proposing a new sample in the propose step?

I wanted to leave users the option of querying the sampler freely for new proposals without changing the state, hence why I thought updating the proposal in step would be a good idea. Furthermore the accept/reject decision is taken in step, so the proposed samples might not end up in the chain - the AM algorithm only uses samples in the chain to do its adaptation.

kaandocal avatar May 26 '21 11:05 kaandocal

Furthermore the accept/reject decision is taken in step, so the proposed samples might not end up in the chain - the AM algorithm only uses samples in the chain to do its adaptation.

My suggestion was not to update the proposal based on the proposed sample but on the previous sample - this should always be the accepted sample. And since the previous sample is available in the propose step, I thought this could be sufficient.

devmotion avatar May 26 '21 11:05 devmotion

I see, so like the approach in #39 ! I personally think having a separate callback interface for adaptation would be cleaner, and being able to track acceptances/rejections might be useful for some algorithms. And compared to the accepted! callback in #39 this should be a bit more consistent/flexible.

kaandocal avatar May 26 '21 11:05 kaandocal

I'm not strictly against adding such a function but there is one major problem: transitions contain all parameters but only for a subset of them (e.g. just one parameter) one might want to use the adaptive proposal. Thus in general it is not correct to call trackstep with the AMProposal and a Transition, it should be called only with the corresponding sample.

One way to achieve this is to handle it in propose. Another one would be to iterate through samples and proposals.

In particular due to these mixed cases, I think it would be better to only add a new proposal but not a new sampler.

devmotion avatar May 26 '21 12:05 devmotion

Another advantage of handling it in the propose step is that you don't have to construct the MvNormal proposal, which probably is not very efficient.

devmotion avatar May 26 '21 12:05 devmotion

Slightly unrelated thought:

being able to track acceptances/rejections might be useful

I think probably one would not want to keep track of these statistics in a separate trackstep function - otherwise every proposal would have to reimplement the tracking functionality. Probably it would be better to hardcode it in the Transition object and the implementation of step, and then possibly forward it to the proposals if needed.

devmotion avatar May 26 '21 12:05 devmotion

Slightly unrelated thought:

being able to track acceptances/rejections might be useful

I think probably one would not want to keep track of these statistics in a separate trackstep function - otherwise every proposal would have to reimplement the tracking functionality. Probably it would be better to hardcode it in the Transition object and the implementation of step, and then possibly forward it to the proposals if needed.

I agree, which is why I think implementing #38 would be useful! It shouldn't be a big change but probably belongs in another pull request

kaandocal avatar May 26 '21 12:05 kaandocal

I'm not strictly against adding such a function but there is one major problem: transitions contain all parameters but only for a subset of them (e.g. just one parameter) one might want to use the adaptive proposal. Thus in general it is not correct to call trackstep with the AMProposal and a Transition, it should be called only with the corresponding sample.

One way to achieve this is to handle it in propose. Another one would be to iterate through samples and proposals.

I have just merged AMSampler into MetropolisHastings. I haven't seen multiple independent adaptive proposals in the wild, but this implementation just calls trackstep for every proposal in case of AbstractArrays or NamedTuples. Note that acceptances/rejections are not determined by a single proposal, but by a combination of all of them so having more than one adaptive proposal might not work very well.

kaandocal avatar May 26 '21 13:05 kaandocal

Any way we could reuse the Welford implementation from AdvancedHMC?

kaandocal avatar May 26 '21 16:05 kaandocal

By the way, it would be easy to combine #39 with this pull request, but we should maybe fix nomenclature for the methods as the names almost overlap - any suggestions?

kaandocal avatar May 26 '21 16:05 kaandocal

Any way we could reuse the Welford implementation from AdvancedHMC?

Only if it's moved to a separate package I assume.

devmotion avatar May 26 '21 17:05 devmotion

By the way - and this is embarassing - #39 already implemented pretty much the same adaptive algorithm. I didn't realise this until now, so this PR is a bit of a duplicate really. Apologies to @arzwa.

kaandocal avatar May 26 '21 17:05 kaandocal

There's a subpackage called Adaptation in AdvancedHMC - for now we could probably borrow code from there (this should be permitted by the BSD-3 and MIT Licences). So we could merge that computation of the sample covariance matrix with this and #39.

One idea would be basically copying the interface in AdvancedHMC, ie. having an abstract Adaptor struct with initialize!(adaptor, proposal, initial_sample, model) and update!(adaptor, proposal, sample_prev, sample, model, accepted). That way we could avoid creating new Proposal structs and cleanly separate the adaptation logic.

kaandocal avatar May 26 '21 17:05 kaandocal

How about moving the AdvancedHMC mass matrix adaption code into AbstractMCMC.Adaption submodule?

yebai avatar May 26 '21 17:05 yebai

How about moving the AdvancedHMC mass matrix adaption code into AbstractMCMC.Adaption submodule?

It might make sense to define an abstract Adaptation interface in AbstractMCMC. As far as I know AdvancedHMC does not use AbstractMCMC though.

kaandocal avatar May 26 '21 17:05 kaandocal

There is a PR in progress - see https://github.com/TuringLang/AdvancedHMC.jl/pull/259

yebai avatar May 26 '21 17:05 yebai

By the way - and this is embarassing - #39 already implemented pretty much the same adaptive algorithm. I didn't realise this until now, so this PR is a bit of a duplicate really. Apologies to @arzwa.

No problem at all, that PR got stalled due to myself not really finding the time to finish it off decently (it worked for what I needed, and I got sidetracked).

arzwa avatar May 26 '21 17:05 arzwa

I created a new branch adaptive (available on my GitHub) which implements a more generic interface (I don't want to spam PRs).

Main changes:

  • MetropolisHastings has a second field named adaptor
  • AbstractAdaptor{ProposalType} as an abstract type for adaptors with methods initialize! and update!
  • NoAdaptor{PT} <: AbstractAdaptor{PT} as a default no-op adaptor
  • Adaptors can be combined for arrays/named tuples of proposals (yielding an array/named tuple of adaptors)
  • AMAdaptor for symmetric MvNormal random walk proposals

Not sure if the type interface is the cleanest, I would appreciate any feedback!

Note: I'm not sure if we really want to support multiple adaptives for arrays/tuples of proposals, this feature could be removed...

kaandocal avatar May 26 '21 20:05 kaandocal

Intuitively I would have thought adaptor should be part of an AdaptiveProposal but not a general property of MetropolisHastings. That seems a bit cleaner but maybe you noticed some advantages if it is part of the sampler?

devmotion avatar May 26 '21 20:05 devmotion

Not really to be honest! I tried to see if it would be cleaner but at least my new "solution" is more complicated than what we have already. Unless you or @cpfiffer see any advantages in separating out the adaptor logic (I don't really) we can just keep it as it is.

kaandocal avatar May 26 '21 20:05 kaandocal

Unless you or @cpfiffer see any advantages in separating out the adaptor logic (I don't really) we can just keep it as it is.

For right now, I think it's fine to just go as-is. We can add in more general adaptation interfaces later on.

cpfiffer avatar May 26 '21 20:05 cpfiffer

I have uploaded a new version that uses Welford's algorithm. The code now more or less covers the AdaptiveMvNormal features in #39 . The current adaptation interface is via trackstep! (added a bang).

kaandocal avatar May 26 '21 22:05 kaandocal

Cool. I like it. I've turned on the test suite so we can see how the tests run.

cpfiffer avatar May 27 '21 00:05 cpfiffer

Cool. I like it. I've turned on the test suite so we can see how the tests run.

There was an error due to my using a different version of LinearAlgebra.jl, the AdaptiveMH test should now run correctly (hopefully)

kaandocal avatar May 27 '21 08:05 kaandocal

I'll approve for now. Could you add a little bit of usage info to the README file?

cpfiffer avatar May 27 '21 14:05 cpfiffer

@arzwa We should also incorporate your univariate adaptive proposal. Any suggestions for a name? We can also change AMProposal.

kaandocal avatar May 28 '21 10:05 kaandocal

@kaandocal Are you still interested in getting this PR merged?

ParadaCarleton avatar Dec 28 '22 22:12 ParadaCarleton

We should probably get this PR out of the way. I did not pursue this much further because I had trouble getting it to work on my sampling problems - this approach needs a lot of samples to get a decent estimate of the covariance matrix and seems quite fickle in practice. Due to the nature of MH it just takes a while to explore the target distribution even if you have a decent covariance matrix... AdvancedHMC does a fantastic job at adaptation in my experience, and this is nowhere near that. While it is quite different @ParadaCarleton I do recommend having a look at Emcee in the meanwhile.

kaandocal avatar Jan 02 '23 08:01 kaandocal

We should probably get this PR out of the way. I did not pursue this much further because I had trouble getting it to work on my sampling problems - this approach needs a lot of samples to get a decent estimate of the covariance matrix and seems quite fickle in practice. Due to the nature of MH it just takes a while to explore the target distribution even if you have a decent covariance matrix... AdvancedHMC does a fantastic job at adaptation in my experience, and this is nowhere near that. While it is quite different @ParadaCarleton I do recommend having a look at Emcee in the meanwhile.

Yep, I totally get that.

The reason an AMH sampler would be useful is less to do with me wanting to use AMH itself, and more to do with AMH being a useful building block in other algorithms. I think @fipelle mentioned an interest in this earlier?

ParadaCarleton avatar Jan 08 '23 22:01 ParadaCarleton