datamol icon indicating copy to clipboard operation
datamol copied to clipboard

Fixing does not appear to work for inferring valence & formal charge states from molecules from some PDB files

Open Croydon-Brixton opened this issue 5 months ago • 4 comments

Thank you for this nice library!

I'm have a question re fixing 'broken' Mols by inferring the correct valences and charges that I was hoping datamol could fix for me.

If I load NAP structures from examples in the pdb (e.g. 5ocm) and simply transfer over bond annotations and atoms (formal charge is not specified in this PDB, so I'm assuming 0 charge) I end up with a structure like this:

smi = "c1cc(c[n](c1)[C@H]2[C@@H]([C@@H]([C@H](O2)CO[P@@](=O)([O])O[P@](=O)(O)OC[C@@H]3[C@H]([C@H]([C@@H](O3)n4cnc5c4ncnc5N)OP(=O)(O)O)O)O)O)C(=O)N"

# The correct smiles would be:
smi_correct = "c1cc(c[n+](c1)[C@H]2[C@@H]([C@@H]([C@H](O2)CO[P@@](=O)([O-])O[P@](=O)(O)OC[C@@H]3[C@H]([C@H]([C@@H](O3)n4cnc5c4ncnc5N)OP(=O)(O)O)O)O)O)C(=O)N"

Screenshot 2024-09-02 at 19 19 10

RDkit then fails to load this due to sanitization problems

mol = Chem.MolFromSmiles(smi)  # < fails
mol = Chem.MolFromSmiles(smi, sanitize=False)  # <works and produces the structure above, which is an invalid molecule

This molecule can be 'rescued' by assigning a positive charge to nitrogen number 4, but the datamol pipeline unfortunately fails to do this:

import datamol as dm

# Standardize and sanitize
mol = Chem.MolFromSmiles(smi, sanitize=False)
mol = dm.fix_mol(mol)
mol = dm.sanitize_mol(mol)
mol = dm.standardize_mol(mol)
Chem.SanitizeMol(mol)

Is there a way to fix this structure computationally with datamol?

Croydon-Brixton avatar Sep 03 '24 02:09 Croydon-Brixton