PyEMMA icon indicating copy to clipboard operation
PyEMMA copied to clipboard

How to deal with many short trajectories

Open KrishnenduSinha12 opened this issue 2 years ago • 4 comments

I have like 30 short MD runs each of which is 50-100ns long. I would like to build MSM using these set of trajectories. Should I concatenate the trajectories before performing tica/pca based clustering. I would like to know if concatenation at the boundary would lead to any artefact in (i) TICA based clustering and (ii) construction of transition matrix. Thanks

KrishnenduSinha12 avatar Jan 02 '22 19:01 KrishnenduSinha12

Hi, concatenation is definitely not the way to go here as it would indeed lead to the artifacts. Just leave them as list of trajectories and let pyemma deal with it internally.

PS: Only TICA/VAMP methods really rely on the trajectory structure being present, PCA for example does not make any use of the temporal structure of the data (and also isn't a recommended method in this setting). Clustering itself also doesn't make use of the temporal information, so here theoretically the data could be concatenated too. But to be on the safe side just leave them as they are (ie as list of trajectories) and let the library handle it.

clonker avatar Jan 03 '22 08:01 clonker

Thanks. I have another question, if I have several short trajectories of different run length will that affect the results? Or is it recommended to use trajectories of similar run length?

KrishnenduSinha12 avatar Jan 11 '22 10:01 KrishnenduSinha12

You can use trajectories of different lengths, no problem. The ones which are shorter than lag are being skipped.

clonker avatar Jan 11 '22 10:01 clonker

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 31 '22 01:07 stale[bot]