DOSE Rationale for decreasing order only in GSEA

Prerequisites

[x] Have you read Feedback and follow the guide?
- [x] make sure your are using the latest release version
- [x] read the documents
- [x] google your quesion/issue

Describe you issue

In performing GSEA, it seems the default is to have the genes sorted in decreasing order:

DOSE:::is.sorted
function (x, decreasing = TRUE) 
{
    all(sort(x, decreasing = decreasing) == x)
}

However, per the original paper, it seems the order should not matter i.e. whether ascending or descending. I would like to know if it is possible to support a non-decreasing sorted gene list or perhaps what the rationale is for a decreasing only gene sort.

This could be related to https://github.com/YuLab-SMU/clusterProfiler/issues/214, https://github.com/YuLab-SMU/clusterProfiler/issues/91, and https://github.com/YuLab-SMU/clusterProfiler/issues/48.

Thank you,

NelsonGon

[x] Make a reproducible example (e.g. 1)
[x] your code should contain comments to describe the problem (e.g. what expected and actually happened?)

Ask in right place

[ ] for bugs or feature requests, post here (github issue)
[ ] for questions, please post to Bioconductor or Biostars with tag DOSE

Dec 02 '21 09:12 Nelson-Gon

It must be descending in DOSE.

Dec 05 '21 15:12 huerqiang

It must be descending in DOSE.

I set by to fgsea but still cannot sort by ascending order. Is fgsea also using only descending order?

Dec 05 '21 16:12 Nelson-Gon

Hello, I use GSEA function from clusterprofiler following the tutorial: http://yulab-smu.top/biomedical-knowledge-mining-book/universal-api.html. The tutorial works well. When I run my data, I got warnings same as https://github.com/YuLab-SMU/clusterProfiler/issues/214. The issue tells me that I should use rank rather than log2fc, which may have the same values and cause warnings. Actually, when I use rank to run GSEA() fun, I got another warning: "unbalanced (positive and negative) gene-level statistic values" and the warning told me that I could use scoreType = "pos" to run GSEA. After using this para, I found these results were not the same, especially between em2 and gl_df. Here is my code.

library(tidyverse) library(clusterProfiler)

data(geneList, package="DOSE") head(geneList)

m_t2g <-msigdbr::msigdbr(species = "Homo sapiens", category = "C2") %>% dplyr::select(gs_name, entrez_gene) em2 <- GSEA(geneList, TERM2GENE = m_t2g) head(em2) dotplot(em2)

gl_df <- data.frame(gene=names(geneList), log2fc = geneList) gl_df <- gl_df %>% dplyr::mutate(rank= rank(log2fc, ties.method = 'random')) %>% arrange(desc(rank))

gl <- gl_df$rank names(gl) <- gl_df$gene head(gl) em2_gl <- GSEA(gl, TERM2GENE = m_t2g)

dotplot(em2_gl)

em2_gl2 <- GSEA(gl, TERM2GENE = m_t2g, scoreType = "pos") dotplot(em2_gl2)

plot from em2:

plot from em2_gl:

plot from em2_gl2:

I do not think this is normal. At present, I use a compromised method to solve this problem. Here is my code. dplyr::mutate(rank = rank(avg_log2FC, ties.method = "random"), avg_log2FC=avg_log2FC + rank*(1e-15)) %>% arrange(desc(avg_log2FC)) add a small value to every log2FC to make the value is unique.

Hope you share more help or information. Thank you!

Mar 11 '22 07:03 shanshenbing

@shanshenbing This warning has no effect on the accuracy of the results.

Mar 13 '22 08:03 huerqiang