ChEBI icon indicating copy to clipboard operation
ChEBI copied to clipboard

Error in SMILES for CHEBI:26355

Open hrp1000 opened this issue 1 year ago • 2 comments

Hi I have a problem with this entry. If I try to generate a fingerprint for calculating the Tanimoto coefficient with CHEBI:60344 with this code, it works fine for CHEBI:17267, but fails for CHEBI:26355 -

from rdkit import Chem, DataStructs from bioservices import ChEBI heme = ChEBI() heme_chebi_id = "CHEBI:60344" heme_smiles = heme.getCompleteEntity(heme_chebi_id).smiles target = Chem.MolFromSmiles(heme_smiles) fp2 = Chem.RDKFingerprint(target) for chebi_id in ["CHEBI:17627", "CHEBI:26355"]: ch = ChEBI() smiley = ch.getCompleteEntity(chebi_id).smiles print("reference:", heme_chebi_id) print("target: ", chebi_id) print("reference:", heme_smiles) print("target: ", smiley) ref = Chem.MolFromSmiles(smiley) fp1 = Chem.RDKFingerprint(ref) Tan = DataStructs.TanimotoSimilarity(fp1, fp2) print(Tan) print("-" * 64) exit()

hrp1000 avatar Jan 25 '23 15:01 hrp1000

Historically, coordination bonds have been depicted in a variety of ways but IUPAC recommends that these bonds should be depicted as regular 'plain' bonds as shown in CHEBI:26355 (see other examples: https://iupac.qmul.ac.uk/tetrapyrrole/TP8.html)

Some existing software including RDKit are unable interpret properly coordination bonds with single bonds and without charges since the structure does not satisfy their strict valence criteria so therefore is unable to generate a fingerprint. You can either contact RDKit about this issue or add charges to the structure to satisfy RDKit's criteria.

amalik01 avatar Jan 25 '23 16:01 amalik01

Since I'm not going to edit the structure (since my hope is that it is correct in the major publicly accessible databases), I think that I'll have to contact rdkit...

hrp1000 avatar Jan 25 '23 16:01 hrp1000