GSEApy icon indicating copy to clipboard operation
GSEApy copied to clipboard

gseapy.prerank - errors for small ranked lists

Open michaelpierrelee opened this issue 7 years ago • 5 comments

Hi, Thanks to implement GSEA on python, it's now quicker to perform analyses.

Please note a bug with gseapy.prerank. With default parameters and when the ranked list is small (<= 15), the function returns

No gene sets passed through filtering condition!!!, try new parameters again! Note: check gene name, gmt file format, or filtering size.

I resolved the problem by lowering down the min_size parameter. So, as long as, min_size is higher than the ranked list size, there are no errors. But this parameter should apply to the gene sets and not to the expression dataset, according to the documentation of GSEA software.

Even with no elements in the ranked list matching with the gene sets, the function should return a warning but not an error.

Moreover, in case of errors, the gene set list passed to the function is emptied.

Best regards, Michaël

michaelpierrelee avatar Oct 03 '18 14:10 michaelpierrelee

This could be done. However, if our ranked gene list are smaller than 15, prerank analysis is still make sense to us? @michaelpierrelee

zqfang avatar Oct 05 '18 09:10 zqfang

Moreover, in case of errors, the gene set list passed to the function is emptied.

This is really annoying, why is that the case? Would it be possible to change the behaviour? Thank you

giovp avatar May 26 '20 19:05 giovp

@michaelpierrelee Thank you for your post. I have the same error and have no idea what to do until I saw this post.

For me, I have to increase the max_size parameter to a number that is larger than the ranked list size.

Even though this error does not exist anymore, I still do not quite understand the meaning of setting the min_size and max_size parameters.

qrzhang avatar Oct 22 '20 03:10 qrzhang

I met this error too ! How to to solve the problem ?

wangjiawen2013 avatar Sep 22 '22 07:09 wangjiawen2013

min_size and max_size are used for filtering how many gene members of a pathway (gene set) should overlap with your ranked list.

A ranked list containing all expressed genes in your experiment (e.g. whole transcriptome ) is recommended to run the GSEA analysis

zqfang avatar Sep 22 '22 18:09 zqfang

Hi, I got the same error "Exception: No gene sets passed through filtering condition" now. I had 92 genes saved in dataframe with its P-value and ranked. It keeps return this error, even I tried differen min_size and permutation_num

Cher-HAN avatar Nov 18 '22 15:11 Cher-HAN

for everyone still encountering this issue, here's a solution in python: https://decoupler-py.readthedocs.io/en/latest/generated/decoupler.run_gsea.html

giovp avatar Nov 18 '22 16:11 giovp

Hi @Cher-HAN, you need to make your gene symbol identifiable for the GMT file you've chosen. By default, gene symbols should be all capitalized when using Enrichr libraries as GMT input

zqfang avatar Nov 18 '22 18:11 zqfang