[BUG] Incorrect charges from Sire to RDKit conversion
Describe the bug Converting some molecules from Sire to RDKit results in some strange formal charges and bonded structure. For example, the following molecule:
becomes (formal charges labelled):
This only happens if loading from a format which lacks bond-order information (eg. prmtop). Loading from an sdf, for example, works fine.
To Reproduce Steps to reproduce the behavior:
mol1 = sr.load('input.inpcrd','input.prmtop')[0]
rdmol1 = sr.convert.to_rdkit(mol1)
for atom in rdmol1.GetAtoms():
print(atom.GetFormalCharge())
mol2 = sr.load('input.sdf')
rdmol1 = sr.convert.to_rdkit(mol2)
for atom in rdmol2.GetAtoms():
print(atom.GetFormalCharge())
I'm not sure what the ideal solution would be, as the information about the correct structure isn't available. Possibly there could be some way to flag unlikely groups like carbanions?
In the absence of stereochemistry, the converter uses an inference routine based on one from MDAnalysis. This works well in many cases, but clearly isn't ideal for everything. I'll take a look at your example tomorrow in case there is a bug somewhere. There have been a few subtle issues in the past.
I also think the fallback relies on appropriate 3D coordinates, so might not work as intended for structures that aren't well minimised.
I just checked, and I get the same result when I minimise the molecule first
Thanks for the update. I've tried using some of the RDKit stereochemistry inference routines, but those fail to assign the correct formal charges too. I'll take a look at some of the sanitization options to see if disabling any helps in this case.