fgsea icon indicating copy to clipboard operation
fgsea copied to clipboard

Understanding the scoreType

Open alkurowska opened this issue 8 months ago • 2 comments

Hello,

I am running fgsea on my custom_pathways. I have 445 pathways with rather large gene-sets (see the histogram plot). custom_pathways_distribution

Additionally my logFC that I use for ranking the genes, are skewed to the negative side (see logFC distribution plot). DEA_logFC

For some of the pathways I get the error message indicating that p-values were not calculated properly due to unbalanced gene-level statistic values. This can result in NA values for pval, padj, NES, and log2err. And it is suggested to increase the nr of permutations. However, after re-running the analysis with a higher nr of permutations, the results did not change and I got the same error.

Upon reading more about your tool, I have decided to use "pos" or "neg" scoreType. This resulted in no error for all the pathways, even when I used "pos" scoreType for my data, which is rather skewed towards the negative. The initial ES values ran with the default parameters were mostly negative. After using the "pos" scoreType those pathways ended up with a very low ES score close to zero. Whereas the initial ES values that were positive, ended-up with high positive values after using "pos" scoreType. As I understand, while using "pos", the tool is taking max positive enrichment for each pathway, regardless of the absolute maximum? So the question I am investigating now is understanding the degree of overrepresentation of the pathways in my data, rather than understanding the maximum enrichment in general?

Could you tell me if I understand this correctly? And also could explain to me why the error doesn't appear anymore. If my data is unbalanced, as in, skewed towards the negative, why is the"pos" scoreType working here well.

Thanks!

alkurowska avatar Jun 06 '24 13:06 alkurowska