ProLIF Standardization using RDKit does not converge for very long segment of proteins

Standardization using RDKit does not converge for very long segment of proteins

Open josevlibera2010 opened this issue 3 years ago • 3 comments

I have been trying to compute protein-protein ifps between a protein with more than 1000 AAs and a small peptide of 6 AAs. To accomplish this task, I had to manually increase the maximum number of iterations in the function "_rebuild_conjugated_bonds" at "MDAnalysis/converters/RDKit.py" and split the analysis into three, separating the protein into three pieces.

Jul 22 '22 15:07 josevlibera2010

Hi,

This sounds more like an issue with the RDKitConverter from MDAnalysis, so I suggest that you open an issue with them for this.

Best, Cédric

Aug 12 '22 18:08 cbouy

@cbouy not an issue for here, but for the RDKit converter, we could probably hard code the standard AAs and not use your heuristics for guessing bond orders.

This is what I currently do for loading PDB files w/ bond orders: https://github.com/OpenFreeEnergy/pdbinf/blob/main/src/pdbinf/_pdbinf.py#L114

Where the templates are: https://github.com/OpenFreeEnergy/pdbinf/blob/main/src/pdbinf/_standard_AAs.py

So could either have this as a dep, or move the hardcoded cif files over and rewrite the logic to avoid the rdkit dependency (though MDA will probably not mind including rdkit more often..)

Oct 11 '23 10:10 richardjgowers

@richardjgowers I started reorganizing the converter code to make it more modulable during the UGM hackathon, i.e. you'd have the possibility to provide your own callable to infer bond orders on a mol, so wrapping up a pdbinf-based callable could definitely be an option if I manage to make some time to finish the PR some day

Oct 11 '23 22:10 cbouy

ProLIF ProLIF copied to clipboard

Standardization using RDKit does not converge for very long segment of proteins

ProLIF
ProLIF copied to clipboard