DOSE icon indicating copy to clipboard operation
DOSE copied to clipboard

Some problems about 'maxGSSize' parameter in enricher_interval series function

Open huangwb8 opened this issue 4 years ago • 1 comments

Hi~

Recently I have realized that there is a more hidden parameters called maxGSSize, which really influence the result of enricher/GSEA analysis. According to the raw code in DOSE,I think it may be involving in gene sets selection before we do some enrichment analysis(like functional enrichment in GO/KEGG or GSEA) based on the number of genes in them.

In practice, more gene sets would be evaluated with a larger maxGSSize and better results would aquired sometimes.

Here are my questions:

  • why the default of 'maxGSSize' is 500? As I know, many gene sets (for example, in MSigDB, containing thousands of genes) have genes more than 500. Is it because the larger gene sets is not suitable for that kind of analysis(GO/KEGG/GSEA)?
  • Is it resonable/recommanded if a larger number for maxGSSize is set in practice?

Thanks~

huangwb8 avatar Nov 04 '19 01:11 huangwb8