Rationale for decreasing order only in GSEA
Prerequisites
- [x] Have you read Feedback and follow the guide?
- [x] make sure your are using the latest release version
- [x] read the documents
- [x] google your quesion/issue
Describe you issue
In performing GSEA, it seems the default is to have the genes sorted in decreasing order:
DOSE:::is.sorted
function (x, decreasing = TRUE)
{
all(sort(x, decreasing = decreasing) == x)
}
However, per the original paper, it seems the order should not matter i.e. whether ascending or descending. I would like to know if it is possible to support a non-decreasing sorted gene list or perhaps what the rationale is for a decreasing only gene sort.
This could be related to https://github.com/YuLab-SMU/clusterProfiler/issues/214, https://github.com/YuLab-SMU/clusterProfiler/issues/91, and https://github.com/YuLab-SMU/clusterProfiler/issues/48.
Thank you,
NelsonGon
- [x] Make a reproducible example (e.g. 1)
- [x] your code should contain comments to describe the problem (e.g. what expected and actually happened?)
Ask in right place
- [ ] for bugs or feature requests, post here (github issue)
- [ ] for questions, please post to Bioconductor or Biostars with tag
DOSE
It must be descending in DOSE.
It must be descending in
DOSE.
I set by to fgsea but still cannot sort by ascending order. Is fgsea also using only descending order?
Hello, I use GSEA function from clusterprofiler following the tutorial: http://yulab-smu.top/biomedical-knowledge-mining-book/universal-api.html. The tutorial works well. When I run my data, I got warnings same as https://github.com/YuLab-SMU/clusterProfiler/issues/214. The issue tells me that I should use rank rather than log2fc, which may have the same values and cause warnings. Actually, when I use rank to run GSEA() fun, I got another warning: "unbalanced (positive and negative) gene-level statistic values" and the warning told me that I could use scoreType = "pos" to run GSEA. After using this para, I found these results were not the same, especially between em2 and gl_df. Here is my code.
library(tidyverse) library(clusterProfiler)
data(geneList, package="DOSE") head(geneList)
m_t2g <-msigdbr::msigdbr(species = "Homo sapiens", category = "C2") %>% dplyr::select(gs_name, entrez_gene) em2 <- GSEA(geneList, TERM2GENE = m_t2g) head(em2) dotplot(em2)
gl_df <- data.frame(gene=names(geneList), log2fc = geneList) gl_df <- gl_df %>% dplyr::mutate(rank= rank(log2fc, ties.method = 'random')) %>% arrange(desc(rank))
gl <- gl_df$rank names(gl) <- gl_df$gene head(gl) em2_gl <- GSEA(gl, TERM2GENE = m_t2g)
dotplot(em2_gl)
em2_gl2 <- GSEA(gl, TERM2GENE = m_t2g, scoreType = "pos") dotplot(em2_gl2)
plot from em2:

plot from em2_gl:

plot from em2_gl2:

I do not think this is normal. At present, I use a compromised method to solve this problem. Here is my code. dplyr::mutate(rank = rank(avg_log2FC, ties.method = "random"), avg_log2FC=avg_log2FC + rank*(1e-15)) %>% arrange(desc(avg_log2FC)) add a small value to every log2FC to make the value is unique.
Hope you share more help or information. Thank you!
@shanshenbing This warning has no effect on the accuracy of the results.