RMG-Py
RMG-Py copied to clipboard
Incorrect Smiles for nitrogenated species
According to RMG subgroup (Feb 23, 2022) Hao-Wei's presentation, a few nitrogenated species seem to have wrong smiles. According to my investigation, this is not due to the resonance
module nor the rdkit
smiles. It is due to the usage of openbabel
to export smiles for nitrogenated species.
How to replicate the bug:
from rmgpy.molecule import Molecule
from rmgpy.molecule.converter import to_ob_mol
m = Molecule().from_adjacency_list(
"""multiplicity 2
1 O u0 p2 c0 {2,S} {13,S}
2 N u0 p1 c0 {1,S} {3,S} {4,S}
3 C u0 p0 c0 {2,S} {5,S} {14,S} {15,S}
4 C u0 p0 c0 {2,S} {6,S} {16,S} {17,S}
5 C u0 p0 c0 {3,S} {18,S} {19,S} {20,S}
6 C u0 p0 c0 {4,S} {21,S} {22,S} {23,S}
7 C u0 p0 c0 {8,B} {9,B} {13,S}
8 C u0 p0 c0 {7,B} {10,B} {24,S}
9 C u0 p0 c0 {7,B} {12,B} {28,S}
10 C u0 p0 c0 {8,B} {11,B} {25,S}
11 C u0 p0 c0 {10,B} {12,B} {26,S}
12 C u0 p0 c0 {9,B} {11,B} {27,S}
13 C u1 p0 c0 {1,S} {7,S} {29,S}
14 H u0 p0 c0 {3,S}
15 H u0 p0 c0 {3,S}
16 H u0 p0 c0 {4,S}
17 H u0 p0 c0 {4,S}
18 H u0 p0 c0 {5,S}
19 H u0 p0 c0 {5,S}
20 H u0 p0 c0 {5,S}
21 H u0 p0 c0 {6,S}
22 H u0 p0 c0 {6,S}
23 H u0 p0 c0 {6,S}
24 H u0 p0 c0 {8,S}
25 H u0 p0 c0 {10,S}
26 H u0 p0 c0 {11,S}
27 H u0 p0 c0 {12,S}
28 H u0 p0 c0 {9,S}
29 H u0 p0 c0 {13,S}
""")
# You can check properties like multiplicity, bond orders, and they look correct
# However the SMILES is incorrect:
m.to_smiles()
# In more depth, the smiles generation involves the following steps
ob_mol = to_ob_mol(m)
obconv = openbabel.OBConversion()
obconv.SetOutFormat('smi')
obconv.WriteString(ob_mol)
I think it is not the conversion that causes the issue. I tried to use openbabel to import the molecule and output the molecule smiles. The result is correct.
obconv1 = openbabel.OBConversion()
obconv1.SetInFormat('smi')
ob_mol_1 = ob.OBMol()
obconv1.ReadString(ob_mol_1, 'CCN(O[CH]C1=CC=CC=C1)CC')
obconv.WriteString(ob_mol_1)
The generated ob_mol
seems incorrect. E.g., bonds and atoms are not set as aromatic (check by IsAromatic()
), but resetting them to aromatic still doesn't yield the correct smiles. One thing looks peculiar is that the explicit valence and total valence don't look right (11). However, I haven't further investigation.