datamol
datamol copied to clipboard
Fixing does not appear to work for inferring valence & formal charge states from molecules from some PDB files
Thank you for this nice library!
I'm have a question re fixing 'broken' Mols by inferring the correct valences and charges that I was hoping datamol
could fix for me.
If I load NAP structures from examples in the pdb (e.g. 5ocm
) and simply transfer over bond annotations and atoms (formal charge is not specified in this PDB, so I'm assuming 0 charge) I end up with a structure like this:
smi = "c1cc(c[n](c1)[C@H]2[C@@H]([C@@H]([C@H](O2)CO[P@@](=O)([O])O[P@](=O)(O)OC[C@@H]3[C@H]([C@H]([C@@H](O3)n4cnc5c4ncnc5N)OP(=O)(O)O)O)O)O)C(=O)N"
# The correct smiles would be:
smi_correct = "c1cc(c[n+](c1)[C@H]2[C@@H]([C@@H]([C@H](O2)CO[P@@](=O)([O-])O[P@](=O)(O)OC[C@@H]3[C@H]([C@H]([C@@H](O3)n4cnc5c4ncnc5N)OP(=O)(O)O)O)O)O)C(=O)N"
RDkit then fails to load this due to sanitization problems
mol = Chem.MolFromSmiles(smi) # < fails
mol = Chem.MolFromSmiles(smi, sanitize=False) # <works and produces the structure above, which is an invalid molecule
This molecule can be 'rescued' by assigning a positive charge to nitrogen number 4, but the datamol
pipeline unfortunately fails to do this:
import datamol as dm
# Standardize and sanitize
mol = Chem.MolFromSmiles(smi, sanitize=False)
mol = dm.fix_mol(mol)
mol = dm.sanitize_mol(mol)
mol = dm.standardize_mol(mol)
Chem.SanitizeMol(mol)
Is there a way to fix this structure computationally with datamol?