PubChemPy
PubChemPy copied to clipboard
JSON Decode Error when using similarity search
Hey,
I am trying to search pubchem for similar compounds with this call:
similars = pcp.get_compounds(smile, 'smiles', searchtype='similarity', threshold=0.7, as_dataframe=True)
This works well for some SMILES, for example for "Cc1noc(C)c1Br". But for others, e.g. "Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O", I get the following error:
Traceback (most recent call last):
File "/home/caro/leval/.snakemake/scripts/tmpqq4csqb1.find_pubchem_hits.py", line 37, in <module>
similars = pcp.get_compounds("Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O", 'smiles', searchtype='similarity', threshold=similarity_threshold, as_dataframe=True)
File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/site-packages/pubchempy.py", line 321, in get_compounds
results = get_json(identifier, namespace, searchtype=searchtype, **kwargs)
File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/site-packages/pubchempy.py", line 299, in get_json
return json.loads(get(identifier, namespace, domain, operation, 'JSON', searchtype, **kwargs).decode())
File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/site-packages/pubchempy.py", line 288, in get
status = json.loads(response.decode())
File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 322816 column 7 (char 7196244)
If I turn the double quotation marks around the SMILES into single ones, I get
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 215438 column 3 (char 4812373)
I would be glad if you could help me here!
Cheers, Caro
May you share a MWE yielding this problem? With a minimal
import pubchempy as pcp
def retrieve_similar(structure=""):
"""Retrieve PubChem entries of similar structure."""
similars = pcp.get_compounds(structure,
'smiles',
searchtype='similarity',
threshold=0.7,
as_dataframe=True)
print(similars)
# the example working fine
retrieve_similar("Cc1noc(C)c1Br")
(or, retrieve_similar("Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O"), respectively), I interpret the output for both like a successful interaction with the database (Python 3.9.2, PubChemPy 1.0.4). For documentation, the archive below includes a Jupyter notebook with a one-time code.
Thanks! With your example,
retrieve_similar("Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O")
works perfectly for me too. However,
ligand_smile = "Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O"
retrieve_similar(ligand_smile)
throws a JSON decode error again.
I found two fixes.
- Explicitely casting it into a string before makes it work again:
retrieve_similar(str(ligand_smile))
This confuses me because type(ligand_smile) and type("Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O") both give me <class 'str'>.
- I used python 3.7.10 and PubChemPy 1.0.4. Upgrading to python 3.9 also fixed the problem.