MolVS icon indicating copy to clipboard operation
MolVS copied to clipboard

TautomerCanonicalizer gives unexpected/forbidden form of phosphoric acid

Open benbowen opened this issue 7 years ago • 3 comments

I'm converting all the molecules in my database to canonical-tautomers and noticed that things like NADH looked weird. You can see it most plainly for phosphoric acid. I didn't expect the Hydrogen on the phosphorous. Is this the correct/expected behavior?

from rdkit import Chem
from rdkit.Chem import Draw
from molvs.tautomer import TautomerCanonicalizer

original_smiles = 'OP(=O)(O)O'

original_mol = Chem.MolFromSmiles(original_smiles)
tautomerized_mol = TautomerCanonicalizer().canonicalize(original_mol)

Draw.MolsToGridImage([original_mol,tautomerized_mol],
                     molsPerRow=3,subImgSize=(200,200),
                     legends=['original','tautomer'])

image

benbowen avatar Feb 08 '18 19:02 benbowen

NADH looks like this

original_smiles = 'NC(=O)C1=CN([C@@H]2O[C@H](COP(=O)(O)OP(=O)(O)OC[C@H]3O[C@@H](N4C=NC5=C4N=CN=C5N)[C@H](O)[C@@H]3O)[C@@H](O)[C@H]2O)C=CC1'

original_mol = Chem.MolFromSmiles(original_smiles)
tautomerized_mol = TautomerCanonicalizer().canonicalize(original_mol)

Draw.MolsToGridImage([original_mol,tautomerized_mol],
                     molsPerRow=1,subImgSize=(600,300),
                     legends=['original','tautomer'])

image

benbowen avatar Feb 08 '18 19:02 benbowen

I think this is caused by the phosphonic acid rules: https://github.com/mcs07/MolVS/blob/master/molvs/tautomer.py#L130

It can probably be fixed by making the SMARTS pattern more strict to match only the intended target: https://en.wikipedia.org/wiki/Phosphorous_acid

mcs07 avatar Feb 09 '18 14:02 mcs07

You are correct, removing that rule stops that moiety from being modified. When you say, "more strict", you think specify an explicit number of bonds on the Phosphorous in the SMARTS pattern?

Why does rdkit allow 7 bonds on the phosphorous? Rdkit is a vast package, but looking at the definition of Phosphorous, it has max bonds of 5.

If I do SantizeMol, the hydrogen stays put. When I paste the structure into ChemDraw, its not valid.

benbowen avatar Feb 09 '18 17:02 benbowen