Difference of results for simple prerank test between v1.1.4 and v1.1.8
Setup
I am reporting a problem with GSEApy version, Python version, and operating system as follows:
import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import gseapy; print(gseapy.__version__)
3.11.4 (main, May 15 2025, 17:30:48) [Clang 15.0.0 (clang-1500.1.0.2.5)] CPython macOS-15.4.1-arm64-arm-64bit 1.1.8
Expected behaviour
I am using a simple
rnk = pd.DataFrame({"gene_name":["TP53", "NFE2L2", "CTNNB1", "KEAP1", "BRCA2"], "rank":[0,1,2,3,4]})
gseapy.prerank(rnk=rnk, gene_sets=["h.all.v2023.1.Hs.symbols.gmt"])
that gives me as result for 1.1.8
pd.DataFrame({'Term': ['h.all.v2023.1.Hs.symbols.gmt__HALLMARK_E2F_TARGETS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_DNA_REPAIR', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_P53_PATHWAY', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_MITOTIC_SPINDLE', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_G2M_CHECKPOINT', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TNFA_SIGNALING_VIA_NFKB', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_WNT_BETA_CATENIN_SIGNALING', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_CHOLESTEROL_HOMEOSTASIS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TGF_BETA_SIGNALING', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_APOPTOSIS'], 'ES': [1.0, -1.0, -1.0, 1.0, 1.0, -0.75, -0.6666666666666666, -0.5, -0.5, -0.5], 'NES': [1.3204175679488308, -1.3203463203463204, -1.3203463203463204, 1.147222222222222, 1.147222222222222, -1.01, -0.8575498575498598, -0.6674008810572688, -0.6674008810572688, -0.6674008810572688], 'FDR q-val': [0.260528038402793, 1.0, 1.0, 0.2976216452105608, 0.2976216452105608, 0.8739769860316289, 1.0, 0.8552704448956987, 0.8552704448956987, 0.8552704448956987]})
Actual behaviour
For version 1.1.4, for the exact same script and data, I have as result
pd.DataFrame({'Term': ['h.all.v2023.1.Hs.symbols.gmt__HALLMARK_P53_PATHWAY', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_DNA_REPAIR', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_E2F_TARGETS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_G2M_CHECKPOINT', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_MITOTIC_SPINDLE', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TNFA_SIGNALING_VIA_NFKB', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_WNT_BETA_CATENIN_SIGNALING', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_APOPTOSIS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_CHOLESTEROL_HOMEOSTASIS', 'h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TGF_BETA_SIGNALING'], 'ES': [-1.0, -1.0, 1.0, 1.0, 1.0, -0.75, -0.6666666666666666, -0.5, -0.5, -0.5], 'NES': [-1.3708133971291865, -1.3708133971291865, 1.3529243392805943, 1.150521609538003, 1.150521609538003, -0.9722948248823836, -0.8661870503597152, -0.6623164763458401, -0.6623164763458401, -0.6623164763458401], 'FDR q-val': [0.2172547570663218, 0.2172547570663218, 0.1255722694571615, 0.2064530194026597, 0.2064530194026597, 1.0, 1.0, 0.9648993164603732, 0.9648993164603732, 0.9648993164603732]})
Steps to reproduce
We can see that geneset order has changed, as well as NES values and FDR q-val and other previous statistics.
The result of v1.1.7 is the same as v1.1.8, so I can't find what changed between 1.1.4 and 1.1.7, thank you for your help
can you output a dataframe result, instead of dicts (e.g. gseapy.prerank(...).res2d ) ? I'd like to see what happened. Maybe this is related to
- issue #299
- gene name checking in >= v1.1.6
Sure sorry for that
Here is the input
gene_name rank
0 TP53 0
1 NFE2L2 1
2 CTNNB1 2
3 KEAP1 3
4 BRCA2 4
output for v1.1.8
Name Term ES NES NOM p-val FDR q-val FWER p-val Tag % Gene % Lead_genes
0 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_E2F_TAR... 1.000000 1.320418 0.299145 0.260528 0.368 1/2 20.00% BRCA2
1 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_DNA_REPAIR -1.000000 -1.320346 0.359016 1.000000 0.595 1/1 100.00% TP53
2 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_P53_PAT... -1.000000 -1.320346 0.359016 1.000000 0.595 1/1 100.00% TP53
3 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_MITOTIC... 1.000000 1.147222 0.486683 0.297622 0.560 1/1 20.00% BRCA2
4 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_G2M_CHE... 1.000000 1.147222 0.486683 0.297622 0.560 1/1 20.00% BRCA2
5 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TNFA_SI... -0.750000 -1.010000 0.650165 0.873977 0.814 1/1 60.00% NFE2L2
6 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_WNT_BET... -0.666667 -0.857550 1.000000 1.000000 1.000 2/2 80.00% TP53;CTNNB1
7 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_CHOLEST... -0.500000 -0.667401 1.000000 0.855270 1.000 1/1 80.00% CTNNB1
8 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TGF_BET... -0.500000 -0.667401 1.000000 0.855270 1.000 1/1 80.00% CTNNB1
9 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_APOPTOSIS -0.500000 -0.667401 1.000000 0.855270 1.000 1/1 80.00% CTNNB1
output for v1.1.4
Name Term ES NES NOM p-val FDR q-val FWER p-val Tag % Gene % Lead_genes
0 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_P53_PAT... -1.000000 -1.370813 0.293194 0.217255 0.000 1/1 100.00% TP53
1 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_DNA_REPAIR -1.000000 -1.370813 0.293194 0.217255 0.000 1/1 100.00% TP53
2 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_E2F_TAR... 1.000000 1.352924 0.270804 0.125572 0.192 1/2 20.00% BRCA2
3 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_G2M_CHE... 1.000000 1.150522 0.476684 0.206453 0.642 1/1 20.00% BRCA2
4 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_MITOTIC... 1.000000 1.150522 0.476684 0.206453 0.642 1/1 20.00% BRCA2
5 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TNFA_SI... -0.750000 -0.972295 0.693548 1.000000 0.956 1/1 60.00% NFE2L2
6 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_WNT_BET... -0.666667 -0.866187 1.000000 1.000000 1.000 2/2 80.00% TP53;CTNNB1
7 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_APOPTOSIS -0.500000 -0.662316 1.000000 0.964899 1.000 1/1 80.00% CTNNB1
8 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_CHOLEST... -0.500000 -0.662316 1.000000 0.964899 1.000 1/1 80.00% CTNNB1
9 prerank h.all.v2023.1.Hs.symbols.gmt__HALLMARK_TGF_BET... -0.500000 -0.662316 1.000000 0.964899 1.000 1/1 80.00% CTNNB1
Hi @guillaumeap , sorry for replying late. I was too busy to check the issue here.
Here is my output from v1.1.8, which is the same as v1.1.4.
Could you let me know if you changed the GMT file input?
Hello @zqfang , it's crazy I don't have the same results when doing the exact same script as you, could it be a pandas or a python version issue?
Also I noticed that lxml is needed to load msidb = Msigdb(), maybe you can add it to the requierements?
Python 3.11.4 (main, May 15 2025, 17:30:48) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gseapy import Msigdb
>>> import pandas as pd
>>> import gseapy as gp
>>> gp.__version__
'1.1.8'
>>> msidb = Msigdb()
>>> hall = msidb.get_gmt()
>>> rnk = pd.DataFrame({"gene_name": ["TP53","NFE2L2","CTNNB1","KEAP1","BRCA2"],"rank":[0,1,2,3,4]})
>>> a=gp.prerank(rnk=rnk,gene_sets=hall,min_size=1)
>>> a.res2d
Name Term ES NES NOM p-val FDR q-val FWER p-val Tag % Gene % Lead_genes
0 prerank HALLMARK_E2F_TARGETS 1.0 1.320418 0.299145 0.260528 0.368 1/2 20.00% BRCA2
1 prerank HALLMARK_P53_PATHWAY -1.0 -1.320346 0.359016 1.0 0.595 1/1 100.00% TP53
2 prerank HALLMARK_DNA_REPAIR -1.0 -1.320346 0.359016 1.0 0.595 1/1 100.00% TP53
3 prerank HALLMARK_G2M_CHECKPOINT 1.0 1.147222 0.486683 0.297622 0.56 1/1 20.00% BRCA2
4 prerank HALLMARK_MITOTIC_SPINDLE 1.0 1.147222 0.486683 0.297622 0.56 1/1 20.00% BRCA2
5 prerank HALLMARK_TNFA_SIGNALING_VIA_NFKB -0.75 -1.01 0.650165 0.873977 0.814 1/1 60.00% NFE2L2
6 prerank HALLMARK_WNT_BETA_CATENIN_SIGNALING -0.666667 -0.85755 1.0 1.0 1.0 2/2 80.00% TP53;CTNNB1
7 prerank HALLMARK_TGF_BETA_SIGNALING -0.5 -0.667401 1.0 0.85527 1.0 1/1 80.00% CTNNB1
8 prerank HALLMARK_CHOLESTEROL_HOMEOSTASIS -0.5 -0.667401 1.0 0.85527 1.0 1/1 80.00% CTNNB1
9 prerank HALLMARK_APOPTOSIS -0.5 -0.667401 1.0 0.85527 1.0 1/1 80.00% CTNNB1
>>> pd.__version__
'2.2.3'
Here is my pip list command after creating an empty venv and running pip install gseapy, and also pip install lxml
Package Version
------------------ -----------
certifi 2025.4.26
charset-normalizer 3.4.2
contourpy 1.3.2
cycler 0.12.1
fonttools 4.58.1
gseapy 1.1.8
idna 3.10
kiwisolver 1.4.8
lxml 5.4.0
matplotlib 3.10.3
numpy 2.2.6
packaging 25.0
pandas 2.2.3
pillow 11.2.1
pip 23.1.2
pyparsing 3.2.3
python-dateutil 2.9.0.post0
pytz 2025.2
requests 2.32.3
scipy 1.15.3
setuptools 65.5.0
six 1.17.0
tzdata 2025.2
urllib3 2.4.0
Something unusual happened, and I'm unable to track them. It may be due to compilation issues.
You can resolve this by compiling from source or installing the version v1.1.9, which I have just uploaded and tested.
Let me know if you still have issues. Thank you