opsin icon indicating copy to clipboard operation
opsin copied to clipboard

Nitrate salt incorrectly(?) connected to compound in opsin 2.5.0

Open greglandrum opened this issue 4 years ago • 1 comments

Hi,

I'm doing a very much overdue update of the opsin library used in the RDKit KNIME nodes (we're currently using 1.3.0) and while running the tests, I ran across the following change.

Parsing the IUPAC name 2-[3-[(4-amino-2-methylpyrimidin-5-yl)methyl]-4-methyl-1,3-thiazol-3-ium-5-yl]ethanol nitrate with v2.5 I get the SMILES:

[N+](=O)([O-])OCCC1=C([N+](=CS1)CC=1C(=NC(=NC1)C)N)C

Whereas v1.3.0 produced:

Cc1c(CCO)sc[n+]1Cc1cnc(C)nc1N.O=[N+]([O-])[O-]

In the new version the nitrate group has been connected (through the terminal OH) to the molecule.

I don't pretend to be good at IUPAC nomenclature, so I tried converting the same name in both BioVia Draw and ChemDraw and in both cases got the disconnected structure that Opsin used to produce.

Is this change intentional?

greglandrum avatar Feb 22 '21 12:02 greglandrum

There's unfortunately ambiguity between common usage and IUPAC nomenclature. In common usage alcohols and groups like nitrate and acetate can implicitly be connected with loss of water.

This particular instance is in my opinion incorrectly handled by OPSIN, as given that the connected interpretation leads to a charged structure, the salt interpretation is far more likely. Older versions of OPSIN only supported the salt intepretation.

ChemDraw implements what I think is ideal behaviour, your name as entered gives a salt, but: 2-[3-[(4-amino-2-methylpyrimidin-5-yl)methyl]-4-methyl-1,3-thiazol-5-yl]ethanol nitrate gives a connected structure

dan2097 avatar Feb 22 '21 13:02 dan2097