PSAnalysisTutorial
PSAnalysisTutorial copied to clipboard
Examples and data for performing path similarity analysis (PSA).
.. -- mode: rst; coding: utf-8 --
=================================== Path Similarity Analysis Tutorial
|zenodo|
:Author: Sean Seyler
:Year: 2015
:License: GNU Public Licence, version 3 (or higher)
:Copyright: © 2015 Sean Seyler
:Citation: Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015).
Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways.
PLoS Comput Biol 11 (10): e1004568. doi: 10.1371/journal.pcbi.1004568_
.. |zenodo| image:: https://zenodo.org/badge/13219/Becksteinlab/PSAnalysisTutorial.svg :target: https://zenodo.org/badge/latestdoi/13219/Becksteinlab/PSAnalysisTutorial
Summary
Path Similarity Analysis (PSA) comprises a computational framework designed to enhance the quantitative comparison of macromolecular transition paths [Seyler2015]. This tutorial provides several examples to demonstrate a comparison, using PSA, of closed to open adenylate kinase (AdK) transition paths generated by a selection of various algorithms [Seyler2014]. Hierarchical clustering is used as a simple, but powerful approach to exploratory data analysis by construction of a heat map-dendrogram representation of the quantitative comparison.
Background
PSA, or PSAnalysis, is based on measuring the geometric similarity of transition paths in configuration space using the Hausdorff and Fréchet path metrics. PSA takes advantage of MDAnalysis_ [Michaud-Agrawal2011]_ to provide a seamless interface to Python and NumPy arrays, and a mechanism for performing path comparisons using arbitrary atom selections. MDAnalysis also provides a format-agnostic framework for reading simulation trajectories, allowing rapid comparison of many different computational methods. More information about the method can be found in [Seyler2015]_.
Usage
This tutorial demonstrates a straightforward application of PSA to a set of transitions of the enzyme adenylate kinase (AdK) generated by a selection of methods (for more background on this particular example see [Seyler2014]_). Two example python scripts are provided to generate an all-pairs distance comparison between the paths (i.e., all unique pairwise distances): a short version shows how to perform similarity analysis on a set of trajectories that have been pre-processed for proper (frame-by-frame) structural alignment; a full version additionally demonstrates, using the PSA framework, how an alignment procedure would be performed prior to similarity analysis. A third script demonstrates how to perform Hausdorff pairs analyses so that users can explore how paths differ from each other as a function of progress, as well as examine the pair of structures for each pair of paths that are responsible for the Hausdorff distance.
Scripts
Analyses are performed by executing the psa_short.py, psa_full.py, or
psa_hausdorff-pairs.py python scripts, which automatically read trajectories
from the methods directory into a PSA object and perform trajectory alignment
(in the case of psa_full.py). psa_short.py and psa_full.py generate
discrete Hausdorff and Fréchet distance matrices, and produce heat
map-dendrograms and annotated heat maps representing the distance matrices after
Ward hierarchical clustering. In psa_hausdorff-pairs.py, a Hausdorff
pairs (nearest neighbor) analysis is performed, with two plots showing the
nearest neighbor (structures) as a function of (normalized) frame progress for
two pairs of paths (DIMS vs DIMS and DIMS vs rTMD-S).
Interactive notebooks
Also provided are Jupyter notebooks (with the .ipynb extension) that give
users the option to perform the same analyses as performed by the scripts in an
interactive, step-by-step manner.
PairID identfier class
The notebooks contain optional analyses (not in the scripts) demonstrating how
to utilize a convenience class called ``PairID``
(provided in ``pair_id.py``). ``PairID`` provides an intuitive interface to
extract data generated by PSA; the Jupyter notebook called
`psa_identifier_example.ipynb`_ demonstrates how it's used. All other notebooks make use
of the ``PairID`` class.
Basic PSA
~~~~~~~~~
The `psa_short.ipynb`_ notebook goes through the basic steps of PSA:
1. Prepare and superimpose trajectories appropriately.
2. Compute Fréchet or Hausdorff distances between all trajectories and generate
a clustered distance matrix.
It uses the same data that were used to prepare the comparison of multiple fast
transition path sampling methods shown in `Figure 6`_ in [Seyler2015]_.
.. _Figure 6: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004568
Hausdorff-Pair analysis
~~~~~~~~~~~~~~~~~~~~~~~
The `psa_hausdorff-pairs.ipynb`_ notebook demonstrates how to extract molecular detail
from a path comparison: It yields the two frames (one from each trajectory) that are
responsible for the largest difference between the two trajectories, as described in
more detail in [Seyler2015]_. It then shows how to compare the distance between trajectories
along a common order parameter.
.. nbviewer links
.. _psa_identifier_example.ipynb:
http://nbviewer.ipython.org/github/Becksteinlab/PSAnalysisTutorial/blob/master/psa_identifier_example.ipynb
.. _psa_short.ipynb:
http://nbviewer.ipython.org/github/Becksteinlab/PSAnalysisTutorial/blob/master/psa_short.ipynb
.. _psa_hausdorff-pairs.ipynb:
http://nbviewer.ipython.org/github/Becksteinlab/PSAnalysisTutorial/blob/master/psa_hausdorff-pairs.ipynb
Script usage
------------
The scripts can be run directly using, for example,
python psa_short.py
and various settings can be customized, as described below. Furthermore, these
scripts can be used as a basis to implement one's own custom analysis.
Customizing the examples
------------------------
The user can also try adjusting settings in each file to change, for example,
the:
* path metric (default: discrete Fréchet [``discrete_frechet``])
* linkage algorithm for hierarchical clustering (default: ``Ward``)
* name and location of the plot (default: ``df_ward_psa-[short/full].pdf``)
These examples should serve as a sufficient basis for understanding PSA's framework.
Some other techniques and analyses using PSA are described in [Seyler2015]_.
Dependencies
============
* MDAnalysis: 0.11.1 or higher
* pandas: 0.16.2 or higher
* seaborn: 0.6.0 or higher
Help
====
If you have questions or problems using the package then ask on
the MDAnalysis user mailing list:
http://groups.google.com/group/mdnalysis-discussion
Contribution
============
This tutorial is still under revision and, although it will be updated to
reflect changes in the ``MDAnalysis.analysis.psa`` module, improvements can
always be made and bugs are likely to be present. Users are encouraged to devise
their own analyses using the PSA framework. Feedback and issues to the tutorial
and PSA are welcome and encouraged!
Implementation in MDAnalysis
============================
If you want to write your own code using PSA then use the
``MDAnalysis.analysis.psa`` module, which is part of MDAnalysis_ (since release
0.10.0) and have a look at the `documentation of the PSA module`_. This tutorial
requires the PSA implementation in MDAnalysis release 0.11.1 for all features to
work properly.
.. _documentation of the PSA module:
http://devdocs.mdanalysis.org/documentation_pages/analysis/psa.html
References
==========
.. Links
.. -----
.. _MDAnalysis: http://www.mdanalysis.org
.. Articles
.. --------
.. [Michaud-Agrawal2011] N. Michaud-Agrawal, E. J. Denning,
T. B. Woolf, and O. Beckstein. MDAnalysis: A toolkit for the
analysis of molecular dynamics simulations. *J Comp Chem*
**32**:2319-2327, 2011. doi:`10.1002/jcc.21787`_. http://www.mdanalysis.org
.. _`10.1002/jcc.21787`: http://doi.org/10.1002/jcc.21787
.. [Seyler2014] S.L. Seyler and O. Beckstein, Sampling large conformational
transitions: adenylate kinase as a testing ground. *Mol Simul* **40**:855–877,
2014. doi:`10.1080/08927022.2014.919497`_
.. _`10.1080/08927022.2014.919497`: http://dx.doi.org/10.1080/08927022.2014.919497
.. [Seyler2015] Seyler SL, Kumar A, Thorpe MF, Beckstein O.
Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways.
*PLoS Comput Biol* **11** (10): e1004568, 2015. doi: `10.1371/journal.pcbi.1004568`_
.. _`10.1371/journal.pcbi.1004568`: http://dx.doi.org/10.1371/journal.pcbi.1004568
.. _`arXiv:1505.04807`: http://arxiv.org/abs/1505.04807