ProLIF icon indicating copy to clipboard operation
ProLIF copied to clipboard

how can i get the ligand atom id which have a interaction with pritein atom?

Open CAODH opened this issue 3 years ago • 3 comments

thanks for your outstnding works ,but how can i find which atom pair between ligand and protein have a interaction? thanks!

CAODH avatar Jun 10 '22 03:06 CAODH

Hi @CAODH,

The best for this is to do it with a dataframe.

Supposing you've already run the interaction calculation, you can then do df = fp.to_dataframe(return_atoms=True). The resulting dataframe will give you ligand and protein atom indices for the corresponding residues and for each interaction.

You can find some more in depth scripts using this in the documentation https://prolif.readthedocs.io/en/latest/notebooks/visualisation.html

Hope that helps, Cedric

cbouy avatar Jun 10 '22 08:06 cbouy

Hi Cédric, me again.

Am I right in understanding that these index numbers relate to the prolif.Residue atom indexing? Do you have any suggestions for how to access the index of the interacting atom in the original .pdb or .mol2 file, where the atomic indices are different to those relative to the prolif.Residue indices?

Thanks, Noah

noahharrison64 avatar Jun 14 '22 16:06 noahharrison64

Indeed, those index correspond to the atom numbering in the residue, not in the input.

To retrieve the index in the actual file, you could try this (assuming you already have the dataframe with atom indices):

from prolif.interactions import get_mapindex

u = mda.Universe(...)
lig = u.select_atoms(...)
prot = u.select_atoms(...)

lmol = plf.Molecule.from_mda(lig)
pmol = plf.Molecule.from_mda(prot)

def convert_index(s):
    lres, pres, _ = s.name
    for i, (li, pi) in enumerate(s):
        if li is None:
            continue
        li = get_mapindex(lmol[lres], li)
        pi = get_mapindex(pmol[pres], pi)
        s[i] = (lig[li].index, prot[pi].index)
    return s

df_index = df.apply(convert_index)

cbouy avatar Jun 14 '22 22:06 cbouy

Hi, I would like to pile on this issue, as I find the mismatch between the two indexing systems highly confusing when dealing with protein-protein interactions. i am looking at the following interaction, which I isolated from a larger p-p complex ('test.pdb'):

ATOM      1  N   TYR A 187      91.367  73.171  56.258  1.00  0.00           N
ATOM      2  HN  TYR A 187      91.352  72.616  57.086  1.00  0.00           H
ATOM      3  CA  TYR A 187      90.893  74.512  56.423  1.00  0.00           C
ATOM      4  HA  TYR A 187      91.009  75.065  55.502  1.00  0.00           H
ATOM      5  CB  TYR A 187      89.431  74.530  56.936  1.00  0.00           C
ATOM      6  HB1 TYR A 187      89.355  74.033  57.926  1.00  0.00           H
ATOM      7  HB2 TYR A 187      89.053  75.570  57.033  1.00  0.00           H
ATOM      8  CG  TYR A 187      88.546  73.791  55.976  1.00  0.00           C
ATOM      9  CD1 TYR A 187      88.119  74.416  54.797  1.00  0.00           C
ATOM     10  HD1 TYR A 187      88.444  75.424  54.581  1.00  0.00           H
ATOM     11  CE1 TYR A 187      87.276  73.742  53.903  1.00  0.00           C
ATOM     12  HE1 TYR A 187      86.938  74.231  53.001  1.00  0.00           H
ATOM     13  CZ  TYR A 187      86.843  72.447  54.194  1.00  0.00           C
ATOM     14  OH  TYR A 187      85.960  71.796  53.320  1.00  0.00           O
ATOM     15  HH  TYR A 187      85.772  70.940  53.712  1.00  0.00           H
ATOM     16  CD2 TYR A 187      88.149  72.467  56.227  1.00  0.00           C
ATOM     17  HD2 TYR A 187      88.508  71.952  57.107  1.00  0.00           H
ATOM     18  CE2 TYR A 187      87.285  71.799  55.350  1.00  0.00           C
ATOM     19  HE2 TYR A 187      86.970  70.792  55.581  1.00  0.00           H
ATOM     20  C   TYR A 187      91.769  75.136  57.482  1.00  0.00           C
ATOM     21  O   TYR A 187      92.303  74.448  58.347  1.00  0.00           O
TER      22      TYR A 187
ATOM     23  N   CYS F   2      85.446  76.051  56.283  1.00  0.00           N
ATOM     24  HN  CYS F   2      85.896  76.803  56.757  1.00  0.00           H
ATOM     25  CA  CYS F   2      85.108  74.901  57.117  1.00  0.00           C
ATOM     26  HA  CYS F   2      85.533  74.018  56.662  1.00  0.00           H
ATOM     27  CB  CYS F   2      85.780  75.087  58.513  1.00  0.00           C
ATOM     28  HB1 CYS F   2      86.820  74.708  58.425  1.00  0.00           H
ATOM     29  HB2 CYS F   2      85.883  76.181  58.679  1.00  0.00           H
ATOM     30  SG  CYS F   2      84.971  74.366  60.001  1.00  0.00           S
ATOM     31  HG1 CYS F   2      86.068  74.221  60.731  1.00  0.00           H
ATOM     32  C   CYS F   2      83.611  74.621  57.180  1.00  0.00           C
ATOM     33  O   CYS F   2      83.184  73.482  57.034  1.00  0.00           O
END

my code is as follows :

mol = Chem.MolFromPDBFile('test.pdb', removeHs=False)
u = mda.Universe(mol)
Ligand = plf.Molecule.from_mda(u, "chainID F and resid 2")
Receptor = plf.Molecule.from_mda(u, "byres chainID A B C D E and around 5 (chainID F and resid 2)")
fp = plf.Fingerprint()
ifp = fp.generate(Ligand, Receptor, return_atoms=True)
ifp["Frame"] = 0
df = plf.to_dataframe([ifp], fp.interactions.keys(),return_atoms=True)

and prolif reports a single interaction between atoms 2 and 4 of protein 1 ("Ligand") and protein2 ("Receptor"), respectively :

ligand           CYS2.F
protein        TYR187.A
interaction Hydrophobic
Frame                  
0                (2, 4)

Using get_mapindex on these atoms leaves the indexes unchanged, meaning that the proposed interaction is between the CA of the CYS residue to the CB of the TYR residue. Looking at the structure, these residues are 4.34Å apart, quite far for a hydrophobic interaction. Indeed, this interaction is not picked up by UCSF-Chimera. However, the CG of the TYR is only 3.78Å from the CA of the CYS and does interact with it, according to the conventional cutoff, and is picked by Chimera. Can this difference between the software suits be an indexing issue? it would be nice if the get_mapindex function would have a flag to return the actual atom names from the PDB file (CA,CB, CG . . . etc) to reduce the confusion level further.

Izhar-Karbat avatar Oct 31 '22 18:10 Izhar-Karbat

Hi @izhar1973,

Looking at the structure, these residues are 4.34Å apart, quite far for a hydrophobic interaction. Indeed, this interaction is not picked up by UCSF-Chimera

By default the Hydrophobic interaction in ProLIF goes up to 4.5Å, but you can modify the distance cutoff by adding this code snippet before creating the plf.Fingerprint object:

class Hydrophobic(plf.interactions.Hydrophobic):
    def __init__(self):
        super().__init__(distance=4.0)

However, the CG of the TYR is only 3.78Å from the CA of the CYS and does interact with it

ProLIF will only return the first atoms that satisfy the SMARTS patterns and geometrical constraints of a given interaction in a given pair of residue, not the "best" (shortest distance) one.

it would be nice if the get_mapindex function would have a flag to return the actual atom names from the PDB file (CA,CB, CG . . . etc)

I think you can do this with one of the following (not 100% sure the will work):

  • Ligand["CYS2.F"].GetAtomWithIdx(2).GetMonomerInfo().GetName() or
  • Ligand.GetAtomWithIdx(get_mapindex(Ligand["CYS2.F"], 2)).GetMonomerInfo().GetName()

but I agree that it could be useful to have!

Hope this answers your questions, Cédric

cbouy avatar Nov 04 '22 01:11 cbouy