ProLIF icon indicating copy to clipboard operation
ProLIF copied to clipboard

Standardization using RDKit does not converge for very long segment of proteins

Open josevlibera2010 opened this issue 3 years ago • 3 comments

I have been trying to compute protein-protein ifps between a protein with more than 1000 AAs and a small peptide of 6 AAs. To accomplish this task, I had to manually increase the maximum number of iterations in the function "_rebuild_conjugated_bonds" at "MDAnalysis/converters/RDKit.py" and split the analysis into three, separating the protein into three pieces.

josevlibera2010 avatar Jul 22 '22 15:07 josevlibera2010

Hi,

This sounds more like an issue with the RDKitConverter from MDAnalysis, so I suggest that you open an issue with them for this.

Best, Cédric

cbouy avatar Aug 12 '22 18:08 cbouy

@cbouy not an issue for here, but for the RDKit converter, we could probably hard code the standard AAs and not use your heuristics for guessing bond orders.

This is what I currently do for loading PDB files w/ bond orders: https://github.com/OpenFreeEnergy/pdbinf/blob/main/src/pdbinf/_pdbinf.py#L114

Where the templates are: https://github.com/OpenFreeEnergy/pdbinf/blob/main/src/pdbinf/_standard_AAs.py

So could either have this as a dep, or move the hardcoded cif files over and rewrite the logic to avoid the rdkit dependency (though MDA will probably not mind including rdkit more often..)

richardjgowers avatar Oct 11 '23 10:10 richardjgowers

@richardjgowers I started reorganizing the converter code to make it more modulable during the UGM hackathon, i.e. you'd have the possibility to provide your own callable to infer bond orders on a mol, so wrapping up a pdbinf-based callable could definitely be an option if I manage to make some time to finish the PR some day

cbouy avatar Oct 11 '23 22:10 cbouy