GSEApy icon indicating copy to clipboard operation
GSEApy copied to clipboard

Handling multiple classes (and FDR)

Open Tomnl opened this issue 1 year ago • 1 comments

I was wondering if there was a recommended way of using GSEApy when the analysis involves comparing more than 2 sample classes. e.g. control Vs class1, control Vs class2 ..... etc

It would be quite straightforward to perform repeated analysis using either prerank or with the gsea function. But I believe the FDR result might not be appropriate...

Potentially I could just combine the resulting dataframes and recalculate the FDR value using the Benjamini-Hochberg approach - see below. But I just wanted to check if this is appropriate and/or if it could be done another way within GSEApy, potentially using the gsea_fdr function.


res1 = gp.prerank(rnk=rnk1, gene_sets=gene_sets)
res2 = gp.prerank(rnk=rnk2, gene_sets=gene_sets)
res_all = pd.concat([res1, res2])

qvals = fdr_bh(res_all['pval'])

def fdr_bh(p):
    """Benjamini-Hochberg p-value correction for multiple hypothesis testing.
    https://stackoverflow.com/questions/7450957/how-to-implement-rs-p-adjust-in-python/33532498#33532498
    """
    p = np.asfarray(p)
    by_descend = p.argsort()[::-1]
    by_orig = by_descend.argsort()
    steps = float(len(p)) / np.arange(len(p), 0, -1)
    q = np.minimum(1, np.minimum.accumulate(steps * p[by_descend]))
    return q[by_orig]

Setup

I am using GSEApy version 0.9.16, Python version 3.9.7, and operating system Ubuntu.

import sys; print(sys.version)
import platform; print(platform.python_implementation()); print(platform.platform())
import gseapy; print(gseapy.__version__)
3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) 
[GCC 9.4.0]
CPython
Linux-5.13.0-44-generic-x86_64-with-glibc2.31
0.9.16

Tomnl avatar Jul 22 '22 15:07 Tomnl

GSEA use permutation (gene label or class label) to generate background distribution and calculate p values.

From my perspective, I don't recommend you to combine differenct background distribution and perform BH correction ( just like you did (concat res1 and res2 and do BH correction), given rnk1 and rnk2 is not the same.

I have never try to compare the null distritbution using different rankings. if the null distribution are all similar, I think BH correction works perfect.

zqfang avatar Jul 25 '22 18:07 zqfang