RMG-Py icon indicating copy to clipboard operation
RMG-Py copied to clipboard

Incorrect Smiles for nitrogenated species

Open xiaoruiDong opened this issue 2 years ago • 0 comments

According to RMG subgroup (Feb 23, 2022) Hao-Wei's presentation, a few nitrogenated species seem to have wrong smiles. According to my investigation, this is not due to the resonance module nor the rdkit smiles. It is due to the usage of openbabel to export smiles for nitrogenated species.

How to replicate the bug:

from rmgpy.molecule import Molecule
from rmgpy.molecule.converter import to_ob_mol

m = Molecule().from_adjacency_list(
"""multiplicity 2
1 O u0 p2 c0 {2,S} {13,S}
2 N u0 p1 c0 {1,S} {3,S} {4,S}
3 C u0 p0 c0 {2,S} {5,S} {14,S} {15,S}
4 C u0 p0 c0 {2,S} {6,S} {16,S} {17,S}
5 C u0 p0 c0 {3,S} {18,S} {19,S} {20,S}
6 C u0 p0 c0 {4,S} {21,S} {22,S} {23,S}
7 C u0 p0 c0 {8,B} {9,B} {13,S}
8 C u0 p0 c0 {7,B} {10,B} {24,S}
9 C u0 p0 c0 {7,B} {12,B} {28,S}
10 C u0 p0 c0 {8,B} {11,B} {25,S}
11 C u0 p0 c0 {10,B} {12,B} {26,S}
12 C u0 p0 c0 {9,B} {11,B} {27,S}
13 C u1 p0 c0 {1,S} {7,S} {29,S}
14 H u0 p0 c0 {3,S}
15 H u0 p0 c0 {3,S}
16 H u0 p0 c0 {4,S}
17 H u0 p0 c0 {4,S}
18 H u0 p0 c0 {5,S}
19 H u0 p0 c0 {5,S}
20 H u0 p0 c0 {5,S}
21 H u0 p0 c0 {6,S}
22 H u0 p0 c0 {6,S}
23 H u0 p0 c0 {6,S}
24 H u0 p0 c0 {8,S}
25 H u0 p0 c0 {10,S}
26 H u0 p0 c0 {11,S}
27 H u0 p0 c0 {12,S}
28 H u0 p0 c0 {9,S}
29 H u0 p0 c0 {13,S}
""")

# You can check properties like multiplicity, bond orders, and they look correct
# However the SMILES is incorrect:
m.to_smiles()

# In more depth, the smiles generation involves the following steps
ob_mol = to_ob_mol(m)
obconv = openbabel.OBConversion()
obconv.SetOutFormat('smi')
obconv.WriteString(ob_mol)

I think it is not the conversion that causes the issue. I tried to use openbabel to import the molecule and output the molecule smiles. The result is correct.

obconv1 = openbabel.OBConversion()
obconv1.SetInFormat('smi')
ob_mol_1 = ob.OBMol()
obconv1.ReadString(ob_mol_1, 'CCN(O[CH]C1=CC=CC=C1)CC')
obconv.WriteString(ob_mol_1)

The generated ob_mol seems incorrect. E.g., bonds and atoms are not set as aromatic (check by IsAromatic()), but resetting them to aromatic still doesn't yield the correct smiles. One thing looks peculiar is that the explicit valence and total valence don't look right (11). However, I haven't further investigation.

xiaoruiDong avatar Mar 01 '22 06:03 xiaoruiDong