PyEMMA icon indicating copy to clipboard operation
PyEMMA copied to clipboard

provide convenience function to plot TICA.timescales (eg. like for ITS)

Open marscher opened this issue 7 years ago • 3 comments

maybe we can simply change the plot_timescales function to accept a list of TICA or a list of timescale arrays.

marscher avatar May 22 '17 19:05 marscher

Yes, I think that is along the lines of what Chris has suggested. I think Chris also suggested to pass a list of lagtimes to the estimator API function instead of calling an extra function. That would return a list of estimators and they could be passed on to plot_timescales.

Am 22/05/17 um 21:46 schrieb Martin K. Scherer:

maybe we can simply change the plot_timescales function to accept a list of TICA or a list of timescale arrays.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1099, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQl6FPo8_zcNbAlDkaoZNsCB1uDmhks5r8eYqgaJpZM4NizTD.

--


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe avatar May 22 '17 20:05 franknoe

I think returning multiple models/estimators upon a parameter is very confusing and also breaks the API. Why don't we extend ITS to do handle TICA too?

marscher avatar May 25 '17 19:05 marscher

That was the original plan and would work of course. The disadvantage with this approach is that in order to make it very straightforward to use we need a separate convenience function for every estimator, such as timescales_msm, timescales_tica etc, which becomes unscalable. Also we don't have to break the API, because if only a single lagtime is passed, only a single estimator (not a 1-element list) would be returned, which is the current behavior.

In any case we should come up with a general way of doing this, because the TRAM estimators currently use the list-of-estimators approach.

The sklearn-way of doing this would probably be

estimator = coordinates.TICA(...)
its = timescales(X, lags = [1, 2, 4, 8, 16, 32])
plots.plot_implied_timescales(its)

Here timescales would call .fit(X) on the estimator for each lag time, and could either return the list of estimators or a condensed version of the resulting information, e.g. an array with timescales. This is similar to what we have now, just that in order to avoid that the user has to construct the object we combine the first two steps into a single API function call. For TICA that might look like this:

its = coordinates.timescales_tica(X, lags = [1, 2, 4, 8, 16, 32])
plots.plot_implied_timescales(its)

But this is where the problem lies. The sklearn-style approach is extremely flexible and modular, but the user has to work with Estimator classes. The current PyEMMA approach is easier for users not used to OO programming, as all estimation can be done with the API functions, but if we want to stick with it, we get into trouble because we'll need a combinatorially growing number of API functions. Chris' approach is quite elegant, because the parameter scanning (in this case over the lag time) is built into the standard estimation function. This approach would look as follows. Current single-estimator call:

tica = coordinates.tica(X, lag = 1)
# do something with tica object

whereas the ITS would be calculated like this:

ticas = coordinates.tica(X, lag = [1, 2, 4, 8, 16, 32])
plots.plot_implied_timescales(ticas)

So we can compute implied timescales without any extra functions - quite elegant. In this approach we could eventually remove the msm.timescales_msm and msm.timescales_hmm functions. However, the same could be achieved by using standard Python:

ticas = [coordinates.tica(X, lag = lag) for lag in [1, 2, 4, 8, 16, 32]]
plots.plot_implied_timescales(ticas)

I'm currently not sure which one I prefer. In any case it would be beneficial if the plot_implied_timescales() function accepts a list of estimators.

Am 25/05/17 um 21:35 schrieb Martin K. Scherer:

I think returning multiple models/estimators upon a parameter is very confusing and also breaks the API. Why don't we extend ITS to do handle TICA too?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/markovmodel/PyEMMA/issues/1099#issuecomment-304102778, or mute the thread https://github.com/notifications/unsubscribe-auth/AGMeQltndw7lWMxPNix3K_xGPXtj3OPUks5r9df-gaJpZM4NizTD.

--


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de Mail: Arnimallee 6, 14195 Berlin, Germany

franknoe avatar May 25 '17 21:05 franknoe