ProLIF icon indicating copy to clipboard operation
ProLIF copied to clipboard

How to extract the list of interacting Amino Acid Residues between the protein and the ligand through PDBQT

Open eschoo opened this issue 2 years ago • 3 comments

I have a Collection (in .tar) that is composed of hundreds of PDBQT files of the resulting docking poses between a protein and a ligand through an HPC Cluster. How would I go about extracting a list of all of the interacting amino acid residues for each of the docked PDBQT files (around 1 million docked files)?

eschoo avatar Aug 12 '22 06:08 eschoo

Hi @eschoo,

Have you looked at this part of the documentation? The only catch is that you'll require either a MOL2 file or SMILES to map the bond orders and formal charges to your PDBQT poses for each ligand.

Best, Cédric

cbouy avatar Aug 12 '22 17:08 cbouy

Hi @cbouy ,

Thank you for the help! My issue is how I would go about this for a large collection of PDBQT files with docked compounds. Would ProLIF have a way of looking into each .tar collection, which is composed of thousands of PDBQT files, and obtain the amino acid residues for each of the PDBQT files and document them? Thank you!

eschoo avatar Aug 14 '22 20:08 eschoo

Sorry for the very late response, I'm quite busy these days!

ProLIF doesn't have a way to directly read files from a tar collection, but the Python standard library has a tarfile module that might have helped. Unfortunately, MDAnalysis (the underlying library used by the prolif.pdbqt_supplier class) does not support multi-model PDBQT files, you have to split them so that they only contain a single docking pose per file. If you've used Vina for docking, there's a vina_split command that you could use to split poses into single files, but it would defeat the purpose of reading directly from a tar file :/

cbouy avatar Oct 01 '22 15:10 cbouy