GSEApy icon indicating copy to clipboard operation
GSEApy copied to clipboard

Why is it possible for a p=0 term to have q=1?

Open searchlie opened this issue 1 year ago • 10 comments

Hi, zqfang. I recently learned of your GSEApy and immediately tried prerank with the data I had on hand and compared it to results of the GSEA provided by Broad institute. Basically, I was very pleased with the results, as NES, p-value, FWER, etc. appeared to be roughly similar for both. However, upon closer inspection of the results, I noticed that only FDR (q value ) behaved strangely:

image

A striking example is a term with p=0 that has q=1 (left). This term had p=0 and q=0 in Broad's GSEA (right):

image

To investigate the issue in depth, I plotted the p and q values and discovered that the relationship can be V-shaped in GSEApy (left), but is roughly monotonic in Broad's results (right).

image

Is this a bug? If it is the intended specification, could you please tell me what could cause such a difference?

Setup

GSEApy version: 0.11.0 Python version: 3.9.6 OS: Debian 11.4

Used rnk file

Used option:

pre_res = gp.prerank(rnk=rnk, gene_sets="c2.cp.kegg.v7.5.1.symbols.gmt", min_size=5)

searchlie avatar Aug 16 '22 10:08 searchlie

Thanks for the comparasion ! I think it's a bug in the recent updated rust code. I will get back to you soon

zqfang avatar Aug 16 '22 16:08 zqfang

Now, it's fixed. see a new release (v0.12.1) for this critial bug fixed

just use pip

pip install -U gseapy

zqfang avatar Aug 16 '22 19:08 zqfang

Thank you for your prompt response! I updated gseapy and now confirmed that the p=0 term has a properly low q value. For reference, I have included some new figures below.

image

When comparing to Broad's GSEA, it still does not ride on the diagonal completely, but it does ride on the lower q-values, so it looks good for practical use. Very much appreciated!!!

searchlie avatar Aug 16 '22 23:08 searchlie

Thank you very much for the feeback. it really helps. So others could have a reference here.

there are 3 reason have you don't align well for FDR q:

(1) I used an approximate method to speed up the calcualtion of FDR q. which is not exactly the same implement as GSEA_R here. the more permutation num, the closer to GSEA_R orignial implementation values.

(2) GSEApy did not use an extra adjust FDRq step to trim FDR q values. ( So I believe GSEApy will have larger FDR q values)

(3) the permutation procedure is hard to make the comparsion fail between GSEA-P and GSEApy.

zqfang avatar Aug 16 '22 23:08 zqfang

Hi @searchlie, unfortunately, I found another critial bug when calculating NES values in Rust code. It affects the FDR calculation and Now is fixed

This bug does not exist for numpy version of gseapy (<=v0.10.8). Please update to 0.13.0 as soon as possible.

pip install -U gseapy

If you don't mind, you could set me the GSEA-P results, I could make a comparsion for FDRs

zqfang avatar Sep 02 '22 18:09 zqfang

Hi @zqfang. Thanks for the update! I immediately used v0.13.0 and compared its q value with the q value output by the Broad instiitute software. It seems to correlate definitely better than v0.12.0! I appreciate your fantastic job! image

searchlie avatar Sep 03 '22 09:09 searchlie

Thanks for the comparsion, too. Yeah. It's my bad. I should notice the weird thing more earlier. I thought the previously result was caused by the approximate method I've implememented. It turns out that It was a mistake I've made a when converting numpy to Rust code.

zqfang avatar Sep 03 '22 20:09 zqfang

@zqfang This is not a bug report, but a report about a hidden change I noticed recently. In previous versions, the results (res2d) were sorted by ES or NES value, but in the latest version, they seemed to be sorted by FWER p-val value. Is my understanding correct? (I would like to check the code directly, but I can't read the Rust code...sorry.)

searchlie avatar Sep 30 '22 01:09 searchlie

in the previous version, the final output results are sorted by FDR. Now are sorted by the absolute value of NES, which will make the sorting order more deterministic.

The rust code only used for computing.

The sorting only affects the row orders in the output table. You could sort the table whatever you like.

zqfang avatar Sep 30 '22 18:09 zqfang

Very well understood, thank you!

searchlie avatar Sep 30 '22 23:09 searchlie