clusterProfiler Different results between different runts

I am running Clusterprofiler and I am getting slightly different results everytime I re-run the pipeline. Is there some option that I can use so that it produces the exact same output everything I run the GSE. Below is the code I am using.

gsea_6_kegg <- gseKEGG(geneList = genekegg_6 , organism = 'hsa', nPerm = 1000, maxGSSize=200, minGSSize = 20, pAdjustMethod = "BH", pvalueCutoff = 0.05, verbose = TRUE)

Mar 05 '19 07:03 sanhe374

I am also seeing this when running universal GSEA and would like to know the same and why this occurs. Thanks

Jul 13 '19 19:07 jckhearn

Got the same problem when using gseKEGG() function! May be because of network instability... Very confused! Need to solved badly!

Many thanks!

Nov 22 '19 13:11 shitiezhu

try nPerm =10000. For GSEA, 10000 permutation is expected if you want stable outcome.

Nov 27 '19 03:11 GuangchuangYu

For network stability, I change to download KEGG using curl instead of reading the file online directly in github version of clusterProfiler.

Don't know whether this can solve the problem. You all are welcome to test with the new version.

Dec 03 '19 10:12 GuangchuangYu

checking for the completeness of downloaded KEGG pathways seems not possible as the API never return the attribute of content-length.

Dec 18 '19 14:12 GuangchuangYu

Even with 10000 permutations, the GSEA function returns different results on each run. reproducible example below:

data(geneList, package="DOSE")
m_t2g <- msigdbr::msigdbr(species = "Homo sapiens", category = "C3") %>% 
  dplyr::select(gs_name, entrez_gene)
GSEA(geneList, TERM2GENE = m_t2g, minGSSize = 120, nPerm = 10000, pvalueCutoff = 0.05, seed = TRUE)

On 4 different runs of the GSEA function above, I've found a total of 129, 136, 131, or 138 enriched terms, even when using set.seed in R and specifying seed = TRUE in the GSEA function (it is unclear from the documentation what setting seed does). I suspect this is because this is using the underlying fgsea implementation of GSEA, but is there a way to keep consistency between runs?

Mar 08 '20 21:03 diegoalexespi

Has a solution surfaced for the seed issue? I have found that results vary between runs of the function as well, presumably due to the inability to designate a seed. Without this functionality, we have issues with reproducibility.

Jun 08 '20 14:06 jthmiller

Also struggling with reproducibility just using enrichKegg(). At a bit of a loss, here - has anyone come across a solution? I'm trying with use_internal_data = TRUE but don't know if it's working or not. Looking for another tool, at this point.

May 17 '21 09:05 lumotroph

I am also struggling with reproducibility using gseKEGG(). Wondering if anyone has seen any resolutions?

Oct 11 '22 23:10 EvelynZav

I am also struggling with reproducibility using gseKEGG(). Wondering if anyone has seen any resolutions?

use set.seed(1234) - or any other random number - in your code and then set seed = TRUE in your gseKEGG(). At least this worked for me.

Nov 15 '22 15:11 L3ft2di3

I also have this problem with the stability of running gseKegg(). Any other suggestions?

Jun 26 '23 22:06 qdong2023

Do you mean gseKEGG()? Then the following should work for you:

I am also struggling with reproducibility using gseKEGG(). Wondering if anyone has seen any resolutions?

use set.seed(1234) - or any other random number - in your code and then set seed = TRUE in your gseKEGG(). At least this worked for me.

gseKegg() is not a function of clusterProfiler - at least as far as I know.

Jun 30 '23 07:06 L3ft2di3

clusterProfiler clusterProfiler copied to clipboard

Different results between different runts

clusterProfiler
clusterProfiler copied to clipboard