rdkit icon indicating copy to clipboard operation
rdkit copied to clipboard

MolToSmiles fails with `Invariant Violation`

Open danpol opened this issue 5 years ago • 2 comments

Configuration:

  • RDKit Version: 2020.03.1
  • Are you using conda? Yes
  • If you are using conda, which channel did you install the rdkit from? -c rdkit

Description:

from rdkit import Chem
mol = list(Chem.ForwardSDMolSupplier('output.sdf', removeHs=False))[0]
Chem.MolToSmiles(mol)

Error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-90-0f5b2a163dca> in <module>
      1 mol = list(Chem.ForwardSDMolSupplier('output.sdf', removeHs=False))[0]
----> 2 Chem.MolToSmiles(mol)

RuntimeError: Invariant Violation
	inconsistent state
	Violation occurred on line 306 in file Code/GraphMol/Canon.cpp
	Failed Expression: ((firstFromAtom2->getBeginAtomIdx() == atom2->getIdx()) ^ (secondFromAtom2->getBeginAtomIdx() == atom2->getIdx()))
	RDKIT: 2020.03.1
	BOOST: 1_67

example.sdf:

0
     RDKit          3D

 47 50  0  0  0  0  0  0  0  0999 V2000
    2.8094    5.5347    3.9549 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.9803    5.7631    2.4518 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2057    4.7513    1.7096 N   0  0  0  0  0  0  0  0  0  0  0  0
    2.8065    3.5331    1.3600 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.2813    2.9697    0.2259 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9852    2.6026   -1.1270 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.9766    2.4326   -2.5030 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.7497    2.4105   -3.1577 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.3271    3.2209   -3.0933 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.3880    2.8207   -4.1041 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2755    3.7576   -5.2731 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.9456    5.0000   -5.1903 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6406    5.8135   -4.1576 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.5843    5.5847   -2.8251 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4862    4.3279   -2.2820 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6119    4.4193   -0.7931 C   0  0  1  0  0  0  0  0  0  0  0  0
   -1.4265    3.3176   -0.2402 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2401    2.0491    0.0429 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0161    1.3332    0.0094 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0021   -0.0041    0.0020 F   0  0  0  0  0  0  0  0  0  0  0  0
    1.1113    2.0881    0.0027 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.5911    5.0028   -0.1419 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.8009    5.0277    1.3433 C   0  0  1  0  0  0  0  0  0  0  0  0
   -0.1316    4.1335    2.1455 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.1695    4.5391    4.2144 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.7550    5.6196    4.2178 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.3819    6.2826    4.5035 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.0347    5.6782    2.1889 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.6202    6.7587    2.1923 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.6105    3.1378    1.9058 H   0  0  0  0  0  0  0  0  0  0  0  0
    2.9068    2.3033   -3.0078 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.6465    1.5861   -3.8982 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.3743    2.9119   -3.6532 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2180    1.7981   -4.4327 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4892    3.3616   -6.2513 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.9057    5.4941   -6.1477 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4194    6.7231   -4.4192 H   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6137    6.4266   -2.1513 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.3690    5.2979   -0.7089 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4993    3.6168   -0.0608 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.1306    1.4606    0.3069 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.5873    6.0839   -0.4506 H   0  0  0  0  0  0  0  0  0  0  0  0
    1.4908    4.6415   -0.6473 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.6018    6.0731    1.6827 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0208    3.0946    1.8532 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.1654    4.4181    1.9498 H   0  0  0  0  0  0  0  0  0  0  0  0
    0.0815    4.2470    3.2084 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1 25  1  0  0  0  0
  1 26  1  0  0  0  0
  1 27  1  0  0  0  0
  2  3  1  0  0  0  0
  2 28  1  0  0  0  0
  2 29  1  0  0  0  0
  3  4  1  0  0  0  0
  3 23  1  0  0  0  0
  4  5  2  0  0  0  0
  4 30  1  0  0  0  0
  5  6  1  0  0  0  0
  5 21  1  0  0  0  0
  6  7  2  0  0  0  0
  6 21  1  0  0  0  0
  7  8  1  0  0  0  0
  7 31  1  0  0  0  0
  8  9  2  0  0  0  0
  8 32  1  0  0  0  0
  9 10  1  0  0  0  0
  9 15  1  0  0  0  0
 10 11  1  0  0  0  0
 10 33  1  0  0  0  0
 10 34  1  0  0  0  0
 11 12  2  0  0  0  0
 11 35  1  0  0  0  0
 12 13  1  0  0  0  0
 12 36  1  0  0  0  0
 13 14  1  0  0  0  0
 13 37  1  0  0  0  0
 14 15  2  0  0  0  0
 14 38  1  0  0  0  0
 15 16  1  0  0  0  0
 16 17  1  0  0  0  0
 16 22  1  0  0  0  0
 16 39  1  0  0  0  0
 17 18  2  0  0  0  0
 17 40  1  0  0  0  0
 18 19  1  0  0  0  0
 18 41  1  0  0  0  0
 19 20  1  0  0  0  0
 19 21  2  0  0  0  0
 22 23  1  0  0  0  0
 22 42  1  0  0  0  0
 22 43  1  0  0  0  0
 23 24  1  0  0  0  0
 23 44  1  0  0  0  0
 24 45  1  0  0  0  0
 24 46  1  0  0  0  0
 24 47  1  0  0  0  0
M  END
>  <id>  (1)
0

>  <smiles>  (1)
CCN1C=C2C3=CC=C4CC=CNC=C4C(C=CC(F)=C32)CC1C

$$$$

danpol avatar Apr 13 '20 10:04 danpol

A few notes about my investigation:

  1. The "removeHs=False" is not required as part of the reproducible.

  2. Open Babel converts this into a SMILES with a lot of stereochemistry:

>>> from openbabel import pybel
>>> for mol in pybel.readfile("sdf", "output.sdf"):
...   print(mol.write("smi"))
...
CCN1/C=C\2/C/3=C/C=C\4/CC=CNC=C4[C@@H](/C=C\C(=C23)\F)C[C@H]1C	0
  1. RDKit can convert the structure if sanitize=False and isomericSmiles=False:
>>> for mol in Chem.ForwardSDMolSupplier('output.sdf', sanitize=False):
...   print(Chem.MolToSmiles(mol, isomericSmiles=False))
...
[H]C1=C2C3=C(F)C([H])=C([H])C([H])(C4=C([H])N([H])C([H])=C([H])C([H])([H])C4=C1[H])C([H])([H])C([H])(C([H])([H])[H])N(C([H])([H])C([H])([H])[H])C([H])=C23

However, enabling even one of sanitize or isomericSmiles results in the Invariant Violation.

  1. This structure does not fail under 2016.09.3, though neither does the result have any stereochemistry:
>>> import rdkit; rdkit.__version__
'2016.09.3'
>>> for mol in Chem.ForwardSDMolSupplier('output.sdf', removeHs=True):
...   print(Chem.MolToSmiles(mol, isomericSmiles=True))
...
CCN1C=C2C3=CC=C4CC=CNC=C4C(C=CC(F)=C32)CC1C

adalke avatar Apr 13 '20 18:04 adalke

Was this ever resolved @danpol or @adalke? I'm getting much the same error in 2023.09.1:

Exception has occurred: PanicException
python function failed RuntimeError: Invariant Violation
	inconsistent state
	Violation occurred on line 306 in file Code/GraphMol/Canon.cpp
	Failed Expression: ((firstFromAtom2->getBeginAtomIdx() == atom2->getIdx()) ^ (secondFromAtom2->getBeginAtomIdx() == atom2->getIdx()))
	RDKIT: 2023.09.1
	BOOST: 1_78

when calling Chem.MolToSmiles(mol).

jemonat-work avatar Nov 10 '23 22:11 jemonat-work