ProLIF
ProLIF copied to clipboard
Standardization using RDKit does not converge for very long segment of proteins
I have been trying to compute protein-protein ifps between a protein with more than 1000 AAs and a small peptide of 6 AAs. To accomplish this task, I had to manually increase the maximum number of iterations in the function "_rebuild_conjugated_bonds" at "MDAnalysis/converters/RDKit.py" and split the analysis into three, separating the protein into three pieces.
Hi,
This sounds more like an issue with the RDKitConverter from MDAnalysis, so I suggest that you open an issue with them for this.
Best, Cédric
@cbouy not an issue for here, but for the RDKit converter, we could probably hard code the standard AAs and not use your heuristics for guessing bond orders.
This is what I currently do for loading PDB files w/ bond orders: https://github.com/OpenFreeEnergy/pdbinf/blob/main/src/pdbinf/_pdbinf.py#L114
Where the templates are: https://github.com/OpenFreeEnergy/pdbinf/blob/main/src/pdbinf/_standard_AAs.py
So could either have this as a dep, or move the hardcoded cif files over and rewrite the logic to avoid the rdkit dependency (though MDA will probably not mind including rdkit more often..)
@richardjgowers I started reorganizing the converter code to make it more modulable during the UGM hackathon, i.e. you'd have the possibility to provide your own callable to infer bond orders on a mol, so wrapping up a pdbinf-based callable could definitely be an option if I manage to make some time to finish the PR some day