GSEApy
GSEApy copied to clipboard
Handling multiple classes (and FDR)
I was wondering if there was a recommended way of using GSEApy when the analysis involves comparing more than 2 sample classes. e.g. control Vs class1, control Vs class2 ..... etc
It would be quite straightforward to perform repeated analysis using either prerank or with the gsea function. But I believe the FDR result might not be appropriate...
Potentially I could just combine the resulting dataframes and recalculate the FDR value using the Benjamini-Hochberg approach - see below. But I just wanted to check if this is appropriate and/or if it could be done another way within GSEApy, potentially using the gsea_fdr function.
res1 = gp.prerank(rnk=rnk1, gene_sets=gene_sets)
res2 = gp.prerank(rnk=rnk2, gene_sets=gene_sets)
res_all = pd.concat([res1, res2])
qvals = fdr_bh(res_all['pval'])
def fdr_bh(p):
"""Benjamini-Hochberg p-value correction for multiple hypothesis testing.
https://stackoverflow.com/questions/7450957/how-to-implement-rs-p-adjust-in-python/33532498#33532498
"""
p = np.asfarray(p)
by_descend = p.argsort()[::-1]
by_orig = by_descend.argsort()
steps = float(len(p)) / np.arange(len(p), 0, -1)
q = np.minimum(1, np.minimum.accumulate(steps * p[by_descend]))
return q[by_orig]
Setup
I am using GSEApy version 0.9.16, Python version 3.9.7, and operating system Ubuntu.
import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import gseapy; print(gseapy.__version__)
3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)
[GCC 9.4.0]
CPython
Linux-5.13.0-44-generic-x86_64-with-glibc2.31
0.9.16
GSEA use permutation (gene label or class label) to generate background distribution and calculate p values.
From my perspective, I don't recommend you to combine differenct background distribution and perform BH correction ( just like you did (concat res1 and res2 and do BH correction), given rnk1 and rnk2 is not the same.
I have never try to compare the null distritbution using different rankings. if the null distribution are all similar, I think BH correction works perfect.