clusterProfiler icon indicating copy to clipboard operation
clusterProfiler copied to clipboard

Different results between different runts

Open sanhe374 opened this issue 5 years ago • 12 comments

I am running Clusterprofiler and I am getting slightly different results everytime I re-run the pipeline. Is there some option that I can use so that it produces the exact same output everything I run the GSE. Below is the code I am using.

gsea_6_kegg <- gseKEGG(geneList = genekegg_6 , organism = 'hsa', nPerm = 1000, maxGSSize=200, minGSSize = 20, pAdjustMethod = "BH", pvalueCutoff = 0.05, verbose = TRUE)

sanhe374 avatar Mar 05 '19 07:03 sanhe374

I am also seeing this when running universal GSEA and would like to know the same and why this occurs. Thanks

jckhearn avatar Jul 13 '19 19:07 jckhearn

Got the same problem when using gseKEGG() function! May be because of network instability... Very confused! Need to solved badly!

Many thanks!

shitiezhu avatar Nov 22 '19 13:11 shitiezhu

try nPerm =10000. For GSEA, 10000 permutation is expected if you want stable outcome.

GuangchuangYu avatar Nov 27 '19 03:11 GuangchuangYu

For network stability, I change to download KEGG using curl instead of reading the file online directly in github version of clusterProfiler.

Don't know whether this can solve the problem. You all are welcome to test with the new version.

GuangchuangYu avatar Dec 03 '19 10:12 GuangchuangYu

checking for the completeness of downloaded KEGG pathways seems not possible as the API never return the attribute of content-length.

GuangchuangYu avatar Dec 18 '19 14:12 GuangchuangYu

Even with 10000 permutations, the GSEA function returns different results on each run. reproducible example below:

data(geneList, package="DOSE")
m_t2g <- msigdbr::msigdbr(species = "Homo sapiens", category = "C3") %>% 
  dplyr::select(gs_name, entrez_gene)
GSEA(geneList, TERM2GENE = m_t2g, minGSSize = 120, nPerm = 10000, pvalueCutoff = 0.05, seed = TRUE)

On 4 different runs of the GSEA function above, I've found a total of 129, 136, 131, or 138 enriched terms, even when using set.seed in R and specifying seed = TRUE in the GSEA function (it is unclear from the documentation what setting seed does). I suspect this is because this is using the underlying fgsea implementation of GSEA, but is there a way to keep consistency between runs?

diegoalexespi avatar Mar 08 '20 21:03 diegoalexespi

Has a solution surfaced for the seed issue? I have found that results vary between runs of the function as well, presumably due to the inability to designate a seed. Without this functionality, we have issues with reproducibility.

jthmiller avatar Jun 08 '20 14:06 jthmiller

Also struggling with reproducibility just using enrichKegg(). At a bit of a loss, here - has anyone come across a solution? I'm trying with use_internal_data = TRUE but don't know if it's working or not. Looking for another tool, at this point.

lumotroph avatar May 17 '21 09:05 lumotroph

I am also struggling with reproducibility using gseKEGG(). Wondering if anyone has seen any resolutions?

EvelynZav avatar Oct 11 '22 23:10 EvelynZav

I am also struggling with reproducibility using gseKEGG(). Wondering if anyone has seen any resolutions?

use set.seed(1234) - or any other random number - in your code and then set seed = TRUE in your gseKEGG(). At least this worked for me.

L3ft2di3 avatar Nov 15 '22 15:11 L3ft2di3

I also have this problem with the stability of running gseKegg(). Any other suggestions?

qdong2023 avatar Jun 26 '23 22:06 qdong2023

Do you mean gseKEGG()? Then the following should work for you:

I am also struggling with reproducibility using gseKEGG(). Wondering if anyone has seen any resolutions?

use set.seed(1234) - or any other random number - in your code and then set seed = TRUE in your gseKEGG(). At least this worked for me.

gseKegg() is not a function of clusterProfiler - at least as far as I know.

L3ft2di3 avatar Jun 30 '23 07:06 L3ft2di3