clusterProfiler
clusterProfiler copied to clipboard
Different results between different runts
I am running Clusterprofiler and I am getting slightly different results everytime I re-run the pipeline. Is there some option that I can use so that it produces the exact same output everything I run the GSE. Below is the code I am using.
gsea_6_kegg <- gseKEGG(geneList = genekegg_6 , organism = 'hsa', nPerm = 1000, maxGSSize=200, minGSSize = 20, pAdjustMethod = "BH", pvalueCutoff = 0.05, verbose = TRUE)
I am also seeing this when running universal GSEA and would like to know the same and why this occurs. Thanks
Got the same problem when using gseKEGG() function! May be because of network instability... Very confused! Need to solved badly!
Many thanks!
try nPerm =10000
. For GSEA, 10000 permutation is expected if you want stable outcome.
For network stability, I change to download KEGG using curl instead of reading the file online directly in github version of clusterProfiler.
Don't know whether this can solve the problem. You all are welcome to test with the new version.
checking for the completeness of downloaded KEGG pathways seems not possible as the API never return the attribute of content-length.
Even with 10000 permutations, the GSEA function returns different results on each run. reproducible example below:
data(geneList, package="DOSE")
m_t2g <- msigdbr::msigdbr(species = "Homo sapiens", category = "C3") %>%
dplyr::select(gs_name, entrez_gene)
GSEA(geneList, TERM2GENE = m_t2g, minGSSize = 120, nPerm = 10000, pvalueCutoff = 0.05, seed = TRUE)
On 4 different runs of the GSEA
function above, I've found a total of 129, 136, 131, or 138 enriched terms, even when using set.seed in R and specifying seed = TRUE
in the GSEA function (it is unclear from the documentation what setting seed
does). I suspect this is because this is using the underlying fgsea implementation of GSEA, but is there a way to keep consistency between runs?
Has a solution surfaced for the seed issue? I have found that results vary between runs of the function as well, presumably due to the inability to designate a seed. Without this functionality, we have issues with reproducibility.
Also struggling with reproducibility just using enrichKegg()
. At a bit of a loss, here - has anyone come across a solution? I'm trying with use_internal_data = TRUE
but don't know if it's working or not. Looking for another tool, at this point.
I am also struggling with reproducibility using gseKEGG()
. Wondering if anyone has seen any resolutions?
I am also struggling with reproducibility using
gseKEGG()
. Wondering if anyone has seen any resolutions?
use set.seed(1234) - or any other random number - in your code and then set seed = TRUE in your gseKEGG(). At least this worked for me.
I also have this problem with the stability of running gseKegg(). Any other suggestions?
Do you mean gseKEGG()? Then the following should work for you:
I am also struggling with reproducibility using
gseKEGG()
. Wondering if anyone has seen any resolutions?use set.seed(1234) - or any other random number - in your code and then set seed = TRUE in your gseKEGG(). At least this worked for me.
gseKegg() is not a function of clusterProfiler - at least as far as I know.