hgraph2graph
hgraph2graph copied to clipboard
Some motifs in generated vocabulary are not parseable for rdkit
I was trying to build our customized language models. I found the pattern "C1=CC=CCNCCcc[cH:1]CC=CCCCC=CCCC=CCCCCC=C1" generated by "get_vocab.py" are not parseable for rdkit.
So when I ran the "preprocess.py", it would report an error on hgraph2graph/hgraph/vocab.py line 65, in count_inters: inters = [a for a in mol.GetAtoms() if a.GetAtomMapNum() > 0] AttributeError: 'NoneType' object has no attribute 'GetAtoms'
It is because within the function vocab.py::count_inters, the code tried to covert smile to mol: line 64: mol = Chem.MolFromSmiles(s)
I would appreciate someone can provide a solution.