PyEMMA icon indicating copy to clipboard operation
PyEMMA copied to clipboard

ITS/MSM: allow to extend nsamples for error='bayes' a posteriori?

Open gph82 opened this issue 8 years ago • 4 comments

It would be great to be able to keep adding samples to an object without having to re-compute from scratch.

E.g., after having called msm.its with nsamples=10, if I want another 10 more samples, currently (I think) I have to restart the estimation (of the whole ITS or the a single MSM object) with nsamples=20, right?

It's a bit in the spirit of https://github.com/markovmodel/PyEMMA/pull/1030

gph82 avatar Sep 11 '17 08:09 gph82

well, in any case one would need to re-run the estimation to pick up a change in the nsamples parameter. Then we could check, if no other parameters had changed and just run the tmatrix sampler again, but this would require us to ensure no other parameters changed or things will break horribly. In general I think this is not worth the effort. For ITS, this was easy, because one only need to ensure that the same dtrajs are passed again.

marscher avatar Sep 11 '17 10:09 marscher

@gph82 https://github.com/marscher/PyEMMA/tree/bmsm_nsamples_setter check this out. How shall we propagate nsamples in ITS encapsuled models? We could add a property nsamples, which only does something, if the underlying estimator is sampled, but I think this is somehow confusing for the case where it is not.

marscher avatar Sep 11 '17 13:09 marscher

The key question here is whether there is a simple and standard way to re-fit estimators in an "incremental" fashion and whether we can implement that for BayesianMSM. I imagine something like fit_partial could work. Please check sklearn for what is standard here.

If that works, then @gph82 can invoke this function on the estimators underlying the ITS object.

Do not change the estimation function ITS object itself. ITS is a shortcut convenience object that deliberately does not offer all capabilities to the underlying estimator. Also do not trigger a fit-function as a result of invoking a setter somewhere - that would be extremely confusing. Please do not change these parts of the code without discussing them first.

franknoe avatar Sep 11 '17 16:09 franknoe

Agreed upon the fear on code obfuscation in this spot. The sampling should be implemented as partial_fit in BayesianMSM to be clear. However the user would then have to access the models in ITS and re-run the estimation on them him/herself. It is somehow a question on how convenient we want to design this. If we want to hide this complexity from the user, the setter approach seems the way to go. If we want to keep the code simple, we go for partial_fit and expect the user to be clever about this.

marscher avatar Jan 17 '18 14:01 marscher