cis/trans Invariant Violation in TautomerEnumerator.Canonicalize
Hello,
I encountered an Invariant Violation error likely caused by a bug or an edge case that is not handled yet.
Describe the bug
TautomerEnumerator.Canonicalize fails with an Invariant Violation for the smiles "CN1C=CC=C/C1=C\N=O". A minimal example SMILES is "C/C=C\N=O". The SMILES "CC=CN=O" with the cis/trans annotation removed does not cause the error. This seems to be a new issue in rdkit version 2023.09.1 because when I switch back to 2023.03.3 no error is thrown.
The traceback:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[31], line 10
8 display(mol)
9 enumerator = rdMolStandardize.TautomerEnumerator()
---> 10 enumerator.Canonicalize(mol)
RuntimeError: Invariant Violation
could not find atom2
Violation occurred on line 227 in file Code/GraphMol/Canon.cpp
Failed Expression: firstFromAtom2
RDKIT: 2023.09.1
BOOST: 1_78
To Reproduce
from rdkit import Chem
from rdkit.Chem.MolStandardize import rdMolStandardize
original_smiles = "CN1C=CC=C/C1=C\\N=O"
minimal_smiles = "C/C=C\\N=O"
mol = Chem.MolFromSmiles(minimal_smiles)
# display(mol)
enumerator = rdMolStandardize.TautomerEnumerator()
enumerator.Canonicalize(mol)
Expected behavior
I expect no error when calling TautomerEnumerator.Canonicalize with the SMILES and that the canonical tautomer is returned.
Screenshots
The original SMILES "CN1C=CC=C/C1=C\N=O"
The minimal example SMILES "C/C=C\N=O"
Configuration (please complete the following information):
- RDKit version: 2023.09.1
- OS: Ubuntu 22.04.2 LTS (using WSL)
- Python version (if relevant): Python 3.8.16
- Are you using conda? yes
- If you are using conda, which channel did you install the rdkit from? pypi
Additional context
The output of conda list | grep rdkit for the failing environment is rdkit 2023.9.1 pypi_0 pypi and for the working environment rdkit 2023.03.3 py38h6c71e64_2 conda-forge
I have the same problem with a nitro group next to a cc double bond.