GSEApy icon indicating copy to clipboard operation
GSEApy copied to clipboard

Difference of results for simple prerank test between v1.1.4 and v1.1.8

Open guillaumeap opened this issue 10 months ago • 5 comments

Setup

I am reporting a problem with GSEApy version, Python version, and operating system as follows:

import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import gseapy; print(gseapy.__version__)

3.11.4 (main, May 15 2025, 17:30:48) [Clang 15.0.0 (clang-1500.1.0.2.5)] CPython macOS-15.4.1-arm64-arm-64bit 1.1.8

Expected behaviour

I am using a simple

rnk = pd.DataFrame({"gene_name":["TP53", "NFE2L2", "CTNNB1", "KEAP1", "BRCA2"], "rank":[0,1,2,3,4]})
gseapy.prerank(rnk=rnk, gene_sets=["h.all.v2023.1.Hs.symbols.gmt"])

that gives me as result for 1.1.8

pd.DataFrame({'Term': ['h.all.v2023.1.Hs.symbols.gmt__HALLMARK_E2F_TARGETS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_DNA_REPAIR', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_P53_PATHWAY', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_MITOTIC_SPINDLE', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_G2M_CHECKPOINT', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TNFA_SIGNALING_VIA_NFKB', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_WNT_BETA_CATENIN_SIGNALING', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_CHOLESTEROL_HOMEOSTASIS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TGF_BETA_SIGNALING', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_APOPTOSIS'], 'ES': [1.0, -1.0, -1.0, 1.0, 1.0, -0.75, -0.6666666666666666, -0.5, -0.5, -0.5], 'NES': [1.3204175679488308, -1.3203463203463204, -1.3203463203463204, 1.147222222222222, 1.147222222222222, -1.01, -0.8575498575498598, -0.6674008810572688, -0.6674008810572688, -0.6674008810572688], 'FDR q-val': [0.260528038402793, 1.0, 1.0, 0.2976216452105608, 0.2976216452105608, 0.8739769860316289, 1.0, 0.8552704448956987, 0.8552704448956987, 0.8552704448956987]})

Actual behaviour

For version 1.1.4, for the exact same script and data, I have as result

pd.DataFrame({'Term': ['h.all.v2023.1.Hs.symbols.gmt__HALLMARK_P53_PATHWAY', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_DNA_REPAIR', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_E2F_TARGETS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_G2M_CHECKPOINT', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_MITOTIC_SPINDLE', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TNFA_SIGNALING_VIA_NFKB', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_WNT_BETA_CATENIN_SIGNALING', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_APOPTOSIS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_CHOLESTEROL_HOMEOSTASIS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TGF_BETA_SIGNALING'], 'ES': [-1.0, -1.0, 1.0, 1.0, 1.0, -0.75, -0.6666666666666666, -0.5, -0.5, -0.5], 'NES': [-1.3708133971291865, -1.3708133971291865, 1.3529243392805943, 1.150521609538003, 1.150521609538003, -0.9722948248823836, -0.8661870503597152, -0.6623164763458401, -0.6623164763458401, -0.6623164763458401], 'FDR q-val': [0.2172547570663218, 0.2172547570663218, 0.1255722694571615, 0.2064530194026597, 0.2064530194026597, 1.0, 1.0, 0.9648993164603732, 0.9648993164603732, 0.9648993164603732]})

Steps to reproduce

We can see that geneset order has changed, as well as NES values and FDR q-val and other previous statistics.

The result of v1.1.7 is the same as v1.1.8, so I can't find what changed between 1.1.4 and 1.1.7, thank you for your help

guillaumeap avatar May 22 '25 09:05 guillaumeap

can you output a dataframe result, instead of dicts (e.g. gseapy.prerank(...).res2d ) ? I'd like to see what happened. Maybe this is related to

  • issue #299
  • gene name checking in >= v1.1.6

zqfang avatar May 22 '25 18:05 zqfang

Sure sorry for that

Here is the input

  gene_name  rank
0      TP53     0
1    NFE2L2     1
2    CTNNB1     2
3     KEAP1     3
4     BRCA2     4

output for v1.1.8

      Name                                               Term        ES       NES  NOM p-val  FDR q-val  FWER p-val Tag %   Gene %   Lead_genes
0  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_E2F_TAR...  1.000000  1.320418   0.299145   0.260528       0.368   1/2   20.00%        BRCA2
1  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_DNA_REPAIR -1.000000 -1.320346   0.359016   1.000000       0.595   1/1  100.00%         TP53
2  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_P53_PAT... -1.000000 -1.320346   0.359016   1.000000       0.595   1/1  100.00%         TP53
3  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_MITOTIC...  1.000000  1.147222   0.486683   0.297622       0.560   1/1   20.00%        BRCA2
4  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_G2M_CHE...  1.000000  1.147222   0.486683   0.297622       0.560   1/1   20.00%        BRCA2
5  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TNFA_SI... -0.750000 -1.010000   0.650165   0.873977       0.814   1/1   60.00%       NFE2L2
6  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_WNT_BET... -0.666667 -0.857550   1.000000   1.000000       1.000   2/2   80.00%  TP53;CTNNB1
7  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_CHOLEST... -0.500000 -0.667401   1.000000   0.855270       1.000   1/1   80.00%       CTNNB1
8  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TGF_BET... -0.500000 -0.667401   1.000000   0.855270       1.000   1/1   80.00%       CTNNB1
9  prerank   h.all.v2023.1.Hs.symbols.gmt__HALLMARK_APOPTOSIS -0.500000 -0.667401   1.000000   0.855270       1.000   1/1   80.00%       CTNNB1

output for v1.1.4

      Name                                               Term        ES       NES  NOM p-val  FDR q-val  FWER p-val Tag %   Gene %   Lead_genes
0  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_P53_PAT... -1.000000 -1.370813   0.293194   0.217255       0.000   1/1  100.00%         TP53
1  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_DNA_REPAIR -1.000000 -1.370813   0.293194   0.217255       0.000   1/1  100.00%         TP53
2  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_E2F_TAR...  1.000000  1.352924   0.270804   0.125572       0.192   1/2   20.00%        BRCA2
3  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_G2M_CHE...  1.000000  1.150522   0.476684   0.206453       0.642   1/1   20.00%        BRCA2
4  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_MITOTIC...  1.000000  1.150522   0.476684   0.206453       0.642   1/1   20.00%        BRCA2
5  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TNFA_SI... -0.750000 -0.972295   0.693548   1.000000       0.956   1/1   60.00%       NFE2L2
6  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_WNT_BET... -0.666667 -0.866187   1.000000   1.000000       1.000   2/2   80.00%  TP53;CTNNB1
7  prerank   h.all.v2023.1.Hs.symbols.gmt__HALLMARK_APOPTOSIS -0.500000 -0.662316   1.000000   0.964899       1.000   1/1   80.00%       CTNNB1
8  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_CHOLEST... -0.500000 -0.662316   1.000000   0.964899       1.000   1/1   80.00%       CTNNB1
9  prerank  h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TGF_BET... -0.500000 -0.662316   1.000000   0.964899       1.000   1/1   80.00%       CTNNB1

guillaumeap avatar May 23 '25 13:05 guillaumeap

Hi @guillaumeap , sorry for replying late. I was too busy to check the issue here.

Here is my output from v1.1.8, which is the same as v1.1.4.

Could you let me know if you changed the GMT file input?

Image

zqfang avatar May 28 '25 20:05 zqfang

Hello @zqfang , it's crazy I don't have the same results when doing the exact same script as you, could it be a pandas or a python version issue? Also I noticed that lxml is needed to load msidb = Msigdb(), maybe you can add it to the requierements?

Python 3.11.4 (main, May 15 2025, 17:30:48) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gseapy import Msigdb
>>> import pandas as pd
>>> import gseapy as gp
>>> gp.__version__
'1.1.8'
>>> msidb = Msigdb()
>>> hall = msidb.get_gmt()
>>> rnk = pd.DataFrame({"gene_name": ["TP53","NFE2L2","CTNNB1","KEAP1","BRCA2"],"rank":[0,1,2,3,4]})
>>> a=gp.prerank(rnk=rnk,gene_sets=hall,min_size=1)
>>> a.res2d
      Name                                 Term        ES       NES NOM p-val FDR q-val FWER p-val Tag %   Gene %   Lead_genes
0  prerank                 HALLMARK_E2F_TARGETS       1.0  1.320418  0.299145  0.260528      0.368   1/2   20.00%        BRCA2
1  prerank                 HALLMARK_P53_PATHWAY      -1.0 -1.320346  0.359016       1.0      0.595   1/1  100.00%         TP53
2  prerank                  HALLMARK_DNA_REPAIR      -1.0 -1.320346  0.359016       1.0      0.595   1/1  100.00%         TP53
3  prerank              HALLMARK_G2M_CHECKPOINT       1.0  1.147222  0.486683  0.297622       0.56   1/1   20.00%        BRCA2
4  prerank             HALLMARK_MITOTIC_SPINDLE       1.0  1.147222  0.486683  0.297622       0.56   1/1   20.00%        BRCA2
5  prerank     HALLMARK_TNFA_SIGNALING_VIA_NFKB     -0.75     -1.01  0.650165  0.873977      0.814   1/1   60.00%       NFE2L2
6  prerank  HALLMARK_WNT_BETA_CATENIN_SIGNALING -0.666667  -0.85755       1.0       1.0        1.0   2/2   80.00%  TP53;CTNNB1
7  prerank          HALLMARK_TGF_BETA_SIGNALING      -0.5 -0.667401       1.0   0.85527        1.0   1/1   80.00%       CTNNB1
8  prerank     HALLMARK_CHOLESTEROL_HOMEOSTASIS      -0.5 -0.667401       1.0   0.85527        1.0   1/1   80.00%       CTNNB1
9  prerank                   HALLMARK_APOPTOSIS      -0.5 -0.667401       1.0   0.85527        1.0   1/1   80.00%       CTNNB1
>>> pd.__version__
'2.2.3'

Here is my pip list command after creating an empty venv and running pip install gseapy, and also pip install lxml

Package            Version
------------------ -----------
certifi            2025.4.26
charset-normalizer 3.4.2
contourpy          1.3.2
cycler             0.12.1
fonttools          4.58.1
gseapy             1.1.8
idna               3.10
kiwisolver         1.4.8
lxml               5.4.0
matplotlib         3.10.3
numpy              2.2.6
packaging          25.0
pandas             2.2.3
pillow             11.2.1
pip                23.1.2
pyparsing          3.2.3
python-dateutil    2.9.0.post0
pytz               2025.2
requests           2.32.3
scipy              1.15.3
setuptools         65.5.0
six                1.17.0
tzdata             2025.2
urllib3            2.4.0

guillaumeap avatar Jun 02 '25 09:06 guillaumeap

Something unusual happened, and I'm unable to track them. It may be due to compilation issues.

You can resolve this by compiling from source or installing the version v1.1.9, which I have just uploaded and tested.

Let me know if you still have issues. Thank you

zqfang avatar Jun 02 '25 17:06 zqfang