GSEApy icon indicating copy to clipboard operation
GSEApy copied to clipboard

JSONDecodeError when using a small background list

Open wongkarenhy-hex opened this issue 10 months ago • 15 comments

Setup

I am reporting a problem with GSEApy version, Python version, and operating system as follows:

>>> import sys; print(sys.version)
3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:34:54) [Clang 16.0.6 ]
>>> import platform; print(platform.python_implementation()); print(platform.platform())
CPython
macOS-14.2.1-arm64-arm-64bit
>>> import gseapy; print(gseapy.__version__)
1.1.5

Expected behaviour

enr_bg = gp.enrichr(gene_list=gene_list,
                 gene_sets=['MSigDB_Hallmark_2020','KEGG_2021_Human'],
                 # organism='human', # organism argment is ignored because user input a background
                 background="tests/data/background.txt",
                 outdir=None, # don't write to disk
                )

The above is directly copied from the gseapy documentation using the same gene_list and background as provided. However, it raises an error when I switched to gene_set=['MGI_Mammalian_Phenotype_2017'] (see below).

Actual behaviour

JSONDecodeError                           Traceback (most recent call last)
File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/requests/models.py:963, in Response.json(self, **kwargs)
    962 try:
--> 963     return complexjson.loads(self.content.decode(encoding), **kwargs)
    964 except UnicodeDecodeError:
    965     # Wrong UTF codec detected; usually because it's not UTF-8
    966     # but some other 8-bit codec.  This is an RFC violation,
    967     # and the server didn't bother to tell us what codec *was*
    968     # used.

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/__init__.py:514, in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, allow_nan, **kw)
    510 if (cls is None and encoding is None and object_hook is None and
    511         parse_int is None and parse_float is None and
    512         parse_constant is None and object_pairs_hook is None
    513         and not use_decimal and not allow_nan and not kw):
--> 514     return _default_decoder.decode(s)
    515 if cls is None:

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/decoder.py:386, in JSONDecoder.decode(self, s, _w, _PY3)
    385     s = str(s, self.encoding)
--> 386 obj, end = self.raw_decode(s)
    387 end = _w(s, end).end()

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/simplejson/decoder.py:416, in JSONDecoder.raw_decode(self, s, idx, _w, _PY3)
    415         idx += 3
--> 416 return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 1 column 10266 (char 10265)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
Cell In[7], line 1
----> 1 enr_bg = gp.enrichr(gene_list=gene_list,
      2                  gene_sets=['MGI_Mammalian_Phenotype_2017'],
      3                  # organism='human', # organism argment is ignored because user input a background
      4                  background="tests/data/background.txt",
      5                  outdir=None, # don't write to disk
      6                 )

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/__init__.py:554, in enrichr(gene_list, gene_sets, organism, outdir, background, cutoff, format, figsize, top_term, no_plot, verbose)
    552 # set organism
    553 enr.set_organism()
--> 554 enr.run()
    556 return enr

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/enrichr.py:652, in Enrichr.run(self)
    650 # whether user input background
    651 if isinstance(bg, set) and len(bg) > 0:
--> 652     shortID, res = self.get_results_with_background(genes_list, bg)
    653 else:
    654     shortID, res = self.get_results(genes_list)

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/gseapy/enrichr.py:297, in Enrichr.get_results_with_background(self, gene_list, background)
    293     self._logger.error("Error fetching enrichment results: %s" % self._gs)
    295 # print(response.text[5700:5900])
--> 297 data = response.json()
    298 # Note: missig Overlap column
    299 colnames = [
    300     "Rank",
    301     "Term",
   (...)
    308     "Old adjusted P-value",
    309 ]

File ~/hx/.pixi/envs/default/lib/python3.11/site-packages/requests/models.py:971, in Response.json(self, **kwargs)
    969             pass
    970         except JSONDecodeError as e:
--> 971             raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
    973 try:
    974     return complexjson.loads(self.text, **kwargs)

JSONDecodeError: Expecting value: line 1 column 10266 (char 10265)

Steps to reproduce

Just switch out the gene_sets param from the above example and it should hit the JSONDecodeError.

Looking at the response.text object, some results apparently look like this:

[35,"MP:0008729 decreased memory B cell number",3.392253592257852E-5, Infinity, Infinity, ["PTPRC","JAK3","TLR4"],0.0024278843567445483, 0, 0 ] 

I think the two Infinity values are the offending ones here. I think they represent the odds ratio and combined scores here? Presumably, this happens because the background gene list does not contain any of the genes in the given gene set.

This becomes a much bigger problem when analyzing gene sets from proteomics experiments, which typically detect far fewer than 10k genes.

Thanks!

wongkarenhy-hex avatar Feb 06 '25 00:02 wongkarenhy-hex

I can't reproduce the bug on my end. Can you re-run the code and try again ?

I use macOS M3 chip

zqfang avatar Feb 06 '25 21:02 zqfang

I'm still able to reproduce the same error when I swap the gene set to MGI_Mammalian_Phenotype_2017:

enr_bg = gp.enrichr(gene_list=gene_list,
                 gene_sets=['MGI_Mammalian_Phenotype_2017'],
                 # organism='human', # organism argment is ignored because user input a background
                 background="tests/data/background.txt",
                 outdir=None, # don't write to disk
                )

Another way to reproduce a similar JSONDecodeError is be to replace background with the original gene_list:

enr_bg = gp.enrichr(gene_list=gene_list,
                 gene_sets=['MGI_Mammalian_Phenotype_2017'],
                 # organism='human', # organism argment is ignored because user input a background
                 background="tests/data/gene_list.txt",
                 outdir=None, # don't write to disk
                )

You should run into the same mathematical error Infinity using the above test case.

FWIW I'm on Apple M3 Max chip.

Thanks!

wongkarenhy-hex avatar Feb 06 '25 22:02 wongkarenhy-hex

It's weird. I use my test dataset, it works even I try 5 times

Image

zqfang avatar Feb 06 '25 23:02 zqfang

Huh did you find the following record from your output? And I suppose the json module in python doesn't support decoding infinity

35,"MP:0008729 decreased memory B cell number",3.392253592257852E-5, Infinity, Infinity, ["PTPRC","JAK3","TLR4"],0.0024278843567445483, 0, 0 ]

wongkarenhy-hex avatar Feb 06 '25 23:02 wongkarenhy-hex

is your requests outdated?

Image

zqfang avatar Feb 06 '25 23:02 zqfang

We are on the same version. How about simplejson?

(hx) (base) karenwong@Karen-Wong-Macbook hx % pixi list | grep requests          
requests                              2.32.3          pyhd8ed1ab_1              57.3 KiB   conda  requests
(hx) (base) karenwong@Karen-Wong-Macbook hx % pixi list | grep simplejson
simplejson                            3.19.3          py311h460d6c5_1           129.8 KiB  conda  simplejson

wongkarenhy-hex avatar Feb 07 '25 00:02 wongkarenhy-hex

I don't have simplejson installed. So the bug comes from simplejson

Image

zqfang avatar Feb 07 '25 00:02 zqfang

Thanks for checking! Would you be able to support simplejson as well?

According to this, allow_nan defaults to False...

wongkarenhy-hex avatar Feb 07 '25 00:02 wongkarenhy-hex

simplejson is an optional dependency for requests. I think you can submit an issue for simplejson/requests team to fix this

zqfang avatar Feb 07 '25 18:02 zqfang

Thanks for the quick reply!

The simplejson module recently updated the default value of allow_nan from True to False in its latest version. Supporting NaN, Infinity, and -Infinity is actually outside the JSON spec, so they may have decided to change the default behavior to align with that.

Looking at the documentation for both simplejson and the built-in json module, we can simply add allow_nan=True to this line of your code to ensure compatibility with both versions. I've tested it on different systems, both with and without simplejson and it works well.

wongkarenhy-hex avatar Feb 07 '25 19:02 wongkarenhy-hex

thanks. but allow_nan break my codebase. I revert it back to default

TypeError: JSONDecoder.__init__() got an unexpected keyword argument 'allow_nan'

zqfang avatar Feb 09 '25 23:02 zqfang

Does the following work?

if 'simplejson' in requests.compat.json.__name__:
    data = response.json(allow_nan=True)
else:
    data = response.json()

wongkarenhy-hex avatar Feb 10 '25 17:02 wongkarenhy-hex

I think the better solution is this:

data = json.loads(response.content)

I prefer to use the build-in library instead of testing the new simplejson as a denpendency

zqfang avatar Feb 10 '25 23:02 zqfang

Will this fix be included in a release?

alam-shahul avatar Apr 26 '25 19:04 alam-shahul

it's already included in the lastest release

zqfang avatar Apr 27 '25 05:04 zqfang