JSONDecodeError when using a small background list
Setup
I am reporting a problem with GSEApy version, Python version, and operating system as follows:
>>> import sys; print(sys.version)
3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
>>> import platform; print(platform.python_implementation()); print(platform.platform())
CPython
macOS-14.2.1-arm64-arm-64bit
>>> import gseapy; print(gseapy.__version__)
1.1.5
Expected behaviour
enr_bg = gp.enrichr(gene_list=gene_list,
gene_sets=['MSigDB_Hallmark_2020','KEGG_2021_Human'],
# organism='human', # organism argment is ignored because user input a background
background="tests/data/background.txt",
outdir=None, # don't write to disk
)
The above is directly copied from the gseapy documentation using the same gene_list and background as provided. However, it raises an error when I switched to gene_set=['MGI_Mammalian_Phenotype_2017'] (see below).
Actual behaviour
JSONDecodeError Traceback (most recent call last)
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/requests/models.py:963, in Response.json(self, **kwargs)
962 try:
--> 963 return complexjson.loads(self.content.decode(encoding), **kwargs)
964 except UnicodeDecodeError:
965 # Wrong UTF codec detected; usually because it's not UTF-8
966 # but some other 8-bit codec. This is an RFC violation,
967 # and the server didn't bother to tell us what codec *was*
968 # used.
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/__init__.py:514, in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, allow_nan, **kw)
510 if (cls is None and encoding is None and object_hook is None and
511 parse_int is None and parse_float is None and
512 parse_constant is None and object_pairs_hook is None
513 and not use_decimal and not allow_nan and not kw):
--> 514 return _default_decoder.decode(s)
515 if cls is None:
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/decoder.py:386, in JSONDecoder.decode(self, s, _w, _PY3)
385 s = str(s, self.encoding)
--> 386 obj, end = self.raw_decode(s)
387 end = _w(s, end).end()
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/decoder.py:416, in JSONDecoder.raw_decode(self, s, idx, _w, _PY3)
415 idx += 3
--> 416 return self.scan_once(s, idx=_w(s, idx).end())
JSONDecodeError: Expecting value: line 1 column 10266 (char 10265)
During handling of the above exception, another exception occurred:
JSONDecodeError Traceback (most recent call last)
Cell In[7], line 1
----> 1 enr_bg = gp.enrichr(gene_list=gene_list,
2 gene_sets=['MGI_Mammalian_Phenotype_2017'],
3 # organism='human', # organism argment is ignored because user input a background
4 background="tests/data/background.txt",
5 outdir=None, # don't write to disk
6 )
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/__init__.py:554, in enrichr(gene_list, gene_sets, organism, outdir, background, cutoff, format, figsize, top_term, no_plot, verbose)
552 # set organism
553 enr.set_organism()
--> 554 enr.run()
556 return enr
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/enrichr.py:652, in Enrichr.run(self)
650 # whether user input background
651 if isinstance(bg, set) and len(bg) > 0:
--> 652 shortID, res = self.get_results_with_background(genes_list, bg)
653 else:
654 shortID, res = self.get_results(genes_list)
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/enrichr.py:297, in Enrichr.get_results_with_background(self, gene_list, background)
293 self._logger.error("Error fetching enrichment results: %s" % self._gs)
295 # print(response.text[5700:5900])
--> 297 data = response.json()
298 # Note: missig Overlap column
299 colnames = [
300 "Rank",
301 "Term",
(...)
308 "Old adjusted P-value",
309 ]
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
969 pass
970 except JSONDecodeError as e:
--> 971 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
973 try:
974 return complexjson.loads(self.text, **kwargs)
JSONDecodeError: Expecting value: line 1 column 10266 (char 10265)
Steps to reproduce
Just switch out the gene_sets param from the above example and it should hit the JSONDecodeError.
Looking at the response.text object, some results apparently look like this:
[35,"MP:0008729 decreased memory B cell number",3.392253592257852E-5, Infinity, Infinity, ["PTPRC","JAK3","TLR4"],0.0024278843567445483, 0, 0 ]
I think the two Infinity values are the offending ones here. I think they represent the odds ratio and combined scores here? Presumably, this happens because the background gene list does not contain any of the genes in the given gene set.
This becomes a much bigger problem when analyzing gene sets from proteomics experiments, which typically detect far fewer than 10k genes.
Thanks!
I can't reproduce the bug on my end. Can you re-run the code and try again ?
I use macOS M3 chip
I'm still able to reproduce the same error when I swap the gene set to MGI_Mammalian_Phenotype_2017:
enr_bg = gp.enrichr(gene_list=gene_list,
gene_sets=['MGI_Mammalian_Phenotype_2017'],
# organism='human', # organism argment is ignored because user input a background
background="tests/data/background.txt",
outdir=None, # don't write to disk
)
Another way to reproduce a similar JSONDecodeError is be to replace background with the original gene_list:
enr_bg = gp.enrichr(gene_list=gene_list,
gene_sets=['MGI_Mammalian_Phenotype_2017'],
# organism='human', # organism argment is ignored because user input a background
background="tests/data/gene_list.txt",
outdir=None, # don't write to disk
)
You should run into the same mathematical error Infinity using the above test case.
FWIW I'm on Apple M3 Max chip.
Thanks!
It's weird. I use my test dataset, it works even I try 5 times
Huh did you find the following record from your output? And I suppose the json module in python doesn't support decoding infinity
35,"MP:0008729 decreased memory B cell number",3.392253592257852E-5, Infinity, Infinity, ["PTPRC","JAK3","TLR4"],0.0024278843567445483, 0, 0 ]
is your requests outdated?
We are on the same version. How about simplejson?
(hx) (base) karenwong@Karen-Wong-Macbook hx % pixi list | grep requests
requests 2.32.3 pyhd8ed1ab_1 57.3 KiB conda requests
(hx) (base) karenwong@Karen-Wong-Macbook hx % pixi list | grep simplejson
simplejson 3.19.3 py311h460d6c5_1 129.8 KiB conda simplejson
I don't have simplejson installed. So the bug comes from simplejson
Thanks for checking! Would you be able to support simplejson as well?
According to this, allow_nan defaults to False...
simplejson is an optional dependency for requests. I think you can submit an issue for simplejson/requests team to fix this
Thanks for the quick reply!
The simplejson module recently updated the default value of allow_nan from True to False in its latest version. Supporting NaN, Infinity, and -Infinity is actually outside the JSON spec, so they may have decided to change the default behavior to align with that.
Looking at the documentation for both simplejson and the built-in json module, we can simply add allow_nan=True to this line of your code to ensure compatibility with both versions. I've tested it on different systems, both with and without simplejson and it works well.
thanks. but allow_nan break my codebase. I revert it back to default
TypeError: JSONDecoder.__init__() got an unexpected keyword argument 'allow_nan'
Does the following work?
if 'simplejson' in requests.compat.json.__name__:
data = response.json(allow_nan=True)
else:
data = response.json()
I think the better solution is this:
data = json.loads(response.content)
I prefer to use the build-in library instead of testing the new simplejson as a denpendency
Will this fix be included in a release?
it's already included in the lastest release