plip icon indicating copy to clipboard operation
plip copied to clipboard

Analyze feature evolution through trajectory data

Open CoryKornowicz opened this issue 4 years ago • 4 comments

The ideal solution to this problem would be to use mdtraj (pytraj, mdanalysis, ...) or other trajectory manipulating modules to load a simulation, then at each time step run plip. However, plip currently only supports PDB file reading and stdin reading. The problem lies in that this can already be accomplished by u = md.Uiniverse(...) for t in u.trajectory: write PDB file, plip read file, delete file

However, this is extremely memory inefficient, yet doable...but not for any simulation above 100,000 frames.

I have been reversing the object instantiation to see if there is a way to retain all of the original methods, just build a bridging class to translate a universe trajectory (includes atoms, distances, connect records, ...) into the objects needed for plip to run but the methods are independent of file structure and depend on the object initialization (structure.py file).

TL;DR Is there a way to generate the objects needed for plip independently of the PDB file.

CoryKornowicz avatar Nov 28 '20 19:11 CoryKornowicz

This is an interesting feature request and is certainly interesting. However, we will not integrate mdtraj as third party dependency. For now the easiest option would be to split your trajectory into individual PDB files and use a simple shell script like so:

!#/bin/bash
for f in *.pdb
        do
                basename=$(basename $f .pdb)
                dirname=$(dirname $f)
                docker run --rm \
                    -v $(readlink -f $dirname):/results \
                    -w /results \
                    pharmai/plip:latest -f ${basename}.pdb -xy -O > $dirname/$basename.xml
done

This way you obtain PLIP XML files for each of the states, which you can analyze separately. We have used this already for thousands of frames and it never was an issue.

fkaiserbio avatar Nov 30 '20 16:11 fkaiserbio

Thank you for the reply! That is a great point, I think I find myself in the niche of having 1mil+ frames to analyze. I will continue to look into it on the side.

CoryKornowicz avatar Dec 02 '20 19:12 CoryKornowicz

I have committed a version under my fork that utilized ParmEd trajectories. However, it is very hacky, yet works for its purpose. ~0.3 seconds to load and profile a 28,000 atom system for a single step. 100 steps for me took ~29 secs. Nothing fancy Macbook pro-2015, I wonder if threading or process pooling will boost this?

import MDAnalysis as md
import time
from plip.structure.parmed import TrajComplex

u = md.Universe(PDB, DCD, guess_bonds=True)
protocomp = u.select_atoms("protein or resname LIG").convert_to("PARMED")

start = time.time()

pdbcomp = TrajComplex()
pdbcomp.load_traj(protocomp)
pdbcomp.analyze()

print("Timed: %f" % (time.time() - start))

print(pdbcomp.interaction_sets)

Under the hood, all it is doing is a long set of conversions. MDAnalysis trajectory -> ParmEd Structure -> io.StringIO of Pdb Conversion in-memory -> OBMol Molecule from PBD -> pybel Molecule from OBMol -> Lastly Plip structure from parsed pybel Molecule class.

If anyone has experience making SWIG classes from OpenBabel, it would be so much easier to write the conversion to an OBMol in C++ from Trajectory where it can be executed quicker, then we could have a faster parser without touching the internals of Plip.

Thoughts, comments? I would like to polish this up with a real conversion function so that it can be included in PLIP for more options of use.

CoryKornowicz avatar Dec 03 '20 20:12 CoryKornowicz

I did some research and it looks as if the pip package pharmd solved my problem

CoryKornowicz avatar Dec 09 '20 04:12 CoryKornowicz