ProLIF Informations about the nested dictionary ifp

Hi, I would like to directly use the ifp nested dictionary but it is not so clear to me the meanings of some of its keys. Some suggestions or examples about how it is possible to use this nested dictionary directly? Also, I'm not able to print the contents of an ifp with the python3 print. If I use print( ifpg_ifp ) I obtain the following: <prolif.fingerprint.Fingerprint: 9 interactions: ['Hydrophobic', 'HBAcceptor', 'HBDonor', 'Cationic', 'Anionic', 'CationPi', 'PiCation', 'PiStacking', 'VdWContact'] at 0x7e84a054ffd0>

Thanks.

Saverio

The following is the my instance of plf.Fingerprint( count = True )

ifpg_ifp_dict_DB02322_vina_sf__docked_first_pose.pkl.txt

Apr 07 '25 18:04 xavgit

Hi Saverio,

As seen in this part of the tutorials, the "raw" data is stored in fp.ifp.

fp.ifp is a dictionary mapping each frame/pose indices to an IFP dictionary for which the content is described in the link.

For the metadata meaning, can you get a bit more specific?

Apr 07 '25 22:04 cbouy

There's also a helper method on the IFP object (it will be added in the next release) that can be read to better understand what some of the keys refer to: https://github.com/chemosim-lab/ProLIF/blob/master/prolif/ifp.py#L79-L93

Apr 07 '25 22:04 cbouy

Hi, thanks for the immediate reply.

I' used the following code with only one pose in the sdf of the ligand: ifpg = plf.Fingerprint( count = True ).run_from_iterable( ligand_best_pose , receptor ) pprint( ifpg.ifp )

that returns:

{0: {(ResidueId(UNL, 1, None), ResidueId(ASP, 230, None)): {'VdWContact': ({'distance': 2.7880253568625664, 'indices': {'ligand': (25,), 'protein': (8,)}, 'parent_indices': {'ligand': (25,), 'protein': (3433,)}},)}, (ResidueId(UNL, 1, None), ResidueId(VAL, 231, None)): {'HBDonor': ({'DHA_angle': 165.2765814800489, 'distance': 3.2617467174077848, 'indices': {'ligand': (19, 44), 'protein': (15,)}, 'parent_indices': {'ligand': (19, 44), 'protein': (3452,)}},), 'VdWContact': ({'distance': 2.7375988147932704, 'indices': {'ligand': (25,), 'protein': (15,)}, 'parent_indices': {'ligand': (25,), 'protein': (3452,)}}, {'distance': 2.257757119870645, 'indices': {'ligand': (44,), 'protein': (15,)}, 'parent_indices': {'ligand': (44,), 'protein': (3452,)}}, {'distance': 2.264217602393037, 'indices': {'ligand': (46,), 'protein': (15,)}, 'parent_indices': {'ligand': (46,), 'protein': (3452,)}})}, ............................................................................................................................................................................ (ResidueId(UNL, 1, None), ResidueId(MG, 665, None)): {'Anionic': ({'distance': 4.323571307130363, 'indices': {'ligand': (0,), 'protein': (0,)}, 'parent_indices': {'ligand': (0,), 'protein': (10220,)}}, {'distance': 2.6333257377528967, 'indices': {'ligand': (2,), 'protein': (0,)}, 'parent_indices': {'ligand': (2,), 'protein': (10220,)}}), 'VdWContact': ({'distance': 3.4055317419029727, 'indices': {'ligand': (1,), 'protein': (0,)}, 'parent_indices': {'ligand': (1,), 'protein': (10220,)}}, {'distance': 2.6333257377528967, 'indices': {'ligand': (2,), 'protein': (0,)}, 'parent_indices': {'ligand': (2,), 'protein': (10220,)}}, {'distance': 2.9493854091269545, 'indices': {'ligand': (24,), 'protein': (0,)}, 'parent_indices': {'ligand': (24,), 'protein': (10220,)}}, {'distance': 2.0872905857580593, 'indices': {'ligand': (25,), 'protein': (0,)}, 'parent_indices': {'ligand': (25,), 'protein': (10220,)}}, {'distance': 2.8413208160970886, 'indices': {'ligand': (26,), 'protein': (0,)}, 'parent_indices': {'ligand': (26,), 'protein': (10220,)}}, {'distance': 2.8140199474511736, 'indices': {'ligand': (27,), 'protein': (0,)}, 'parent_indices': {'ligand': (27,), 'protein': (10220,)}}, {'distance': 2.4782792531891245, 'indices': {'ligand': (46,), 'protein': (0,)}, 'parent_indices': {'ligand': (46,), 'protein': (10220,)}}, {'distance': 1.9930058493219314, 'indices': {'ligand': (47,), 'protein': (0,)}, 'parent_indices': {'ligand': (47,), 'protein': (10220,)}})}}} How do I can extract the involved residues from the nested dictionary? Some of which are the following: ResidueId(ASP, 230, None) ResidueId(VAL, 231, None) ResidueId(PHE, 286, None) ................................................ ResidueId(MG, 665, None)

From this information I can use ifp[ "ASP129"] to get the corresponding interactions results. In my case why print( ifpg.ifp['PHE286'] ) return the error:
print( ifpg.ifp['PHE286'] ) KeyError: 'PHE286'

A last question. What it means 'indices' and 'parent indices', that for the ligand are the same whereas for the protein are different? returned example: {'HBDonor': ({'DHA_angle': 165.2765814800489, 'distance': 3.2617467174077848, 'indices': {'ligand': (19, 44), 'protein': (15,)}, 'parent_indices': {'ligand': (19, 44), 'protein': (3452,)}},), I guess that this is a coordinate system to indicate the involved atoms in the found interaction in the corresponding structure files, pdb or sdf or some other format. How do I can use it in order to determine such elements?

Many thanks again.

Saverio

Apr 08 '25 09:04 xavgit

How do I can extract the involved residues from the nested dictionary?

Just loop over fp.ifp[0] which yields tuples, and take the second element of each tuple for the protein residue (the first one is the ligand).

In my case why print( ifpg.ifp['PHE286'] ) return the error: print( ifpg.ifp['PHE286'] ) KeyError: 'PHE286'

As the name of the error says, that key does not exist in the IFP dictionary (in other words, there was no interaction detected with that residue on that specific frame)

What it means 'indices' and 'parent indices'

ProLIF holds 2 versions of each molecule: the whole prolif.Molecule object, and one that is fragmented based on residue identifiers which you can access with prolif.Molecule["PHE286"]. parent_indices refer to the indices in the whole molecule, while indices are for each residue molecule. For a small molecule there's only a single residue identifier (e. g. LIG0) so both indices are the same.

Apr 08 '25 13:04 cbouy

Hi, thanks. As of the error for print( ifpg.ifp['PHE286'] ) the ifpg.ifp dict has the following for PHE 286

 (ResidueId(UNL, 1, None), ResidueId(PHE, 286, None)): {'VdWContact': ({'distance': 2.858342337974704,
                                                                        'indices': {'ligand': (17,),
                                                                                    'protein': (12,)},
                                                                        'parent_indices': {'ligand': (17,),
                                                                                           'protein': (4235,)}},
                                                                       {'distance': 2.1334880861998937,
                                                                        'indices': {'ligand': (17,),
                                                                                    'protein': (13,)},
                                                                        'parent_indices': {'ligand': (17,),
                                                                                           'protein': (4236,)}},
                                                                       {'distance': 2.5745843985410075,
                                                                        'indices': {'ligand': (42,),
                                                                                    'protein': (12,)},
                                                                        'parent_indices': {'ligand': (42,),
                                                                                           'protein': (4235,)}})},

Therefore I'm wrong in forming the the string 'PHE286' as argument of ifpg.ifp[ ...] given that the "section" for this residue is not empty? How the string must be constructed from (ResidueId(UNL, 1, None), ResidueId(PHE, 286, None) to use it in ifpg.ifp[ ...]?

Thanks.

Saverio

Apr 08 '25 13:04 xavgit

No the string is right but you have to index by frame first, and then by residue

Apr 08 '25 14:04 cbouy

Hi, thanks now it works fine.

A very last question. Do you have any suggestions to split the interactions information for example in this following case?

 (ResidueId(UNL, 1, None), ResidueId(MG, 664, None)): {'Anionic': ({'distance': 4.177593833264258,
                                                                    'indices': {'ligand': (0,),
                                                                                'protein': (0,)},
                                                                    'parent_indices': {'ligand': (0,),
                                                                                       'protein': (10219,)}},
                                                                   {'distance': 3.4068778951148024,
                                                                    'indices': {'ligand': (10,),
                                                                                'protein': (0,)},
                                                                    'parent_indices': {'ligand': (10,),
                                                                                       'protein': (10219,)}},
                                                                   {'distance': 4.322899165036575,
                                                                    'indices': {'ligand': (12,),
                                                                                'protein': (0,)},
                                                                    'parent_indices': {'ligand': (12,),
                                                                                       'protein': (10219,)}}),
                                                       'VdWContact': ({'distance': 3.2603519498481326,
                                                                       'indices': {'ligand': (3,),
                                                                                   'protein': (0,)},
                                                                       'parent_indices': {'ligand': (3,),
                                                                                          'protein': (10219,)}},
                                                                      {'distance': 3.268308813517863,
                                                                       'indices': {'ligand': (7,),
                                                                                   'protein': (0,)},
                                                                       'parent_indices': {'ligand': (7,),
                                                                                          'protein': (10219,)}},
                                                                      {'distance': 2.413134393454925,
                                                                       'indices': {'ligand': (8,),
                                                                                   'protein': (0,)},
                                                                       'parent_indices': {'ligand': (8,),
                                                                                          'protein': (10219,)}},
                                                                      {'distance': 3.0483928252310535,
                                                                       'indices': {'ligand': (13,),
                                                                                   'protein': (0,)},
                                                                       'parent_indices': {'ligand': (13,),
                                                                                          'protein': (10219,)}},
                                                                      {'distance': 2.594174985026052,
                                                                       'indices': {'ligand': (34,),
                                                                                   'protein': (0,)},
                                                                       'parent_indices': {'ligand': (34,),
                                                                                          'protein': (10219,)}})},

In 'Anionic' section and its components and 'VdWContact' section and its components? The collected informations are detailed and interesting but it seems to me not an easy task to extract them in a programmatic way from this nested dictionary.

Thanks

Apr 08 '25 14:04 xavgit

I'm sorry I don't offer that level of support, but it would be useful for you to share what the reformatted data would look like if you want someone to help

Apr 08 '25 16:04 cbouy

Hi, thanks for your suggestions.

What I would like to use is a set of dict and/or list such as:

ifp[ poses ] = number of poses of the ligand. ifp[ pose_index_i] = [ res_spec_1 , ... ,res_spec_N] = list of residues for which there is one or more non-empty interactions available from prolif with the ligand. ifp[ pose_index_i][ res_spec_j] list of non-empty interactions = [ interaction_type_1, ... , interaction_type_M] ifp[ pose_index_i][ res_spec_j] [ interaction_type_O] = list of dicts whose keys are interaction_type_O dependent and known from the docs.

What do you think about this?

Thanks.

Saverio

Apr 08 '25 19:04 xavgit

ProLIF ProLIF copied to clipboard

Informations about the nested dictionary ifp

ProLIF
ProLIF copied to clipboard