clusterProfiler icon indicating copy to clipboard operation
clusterProfiler copied to clipboard

P.adjust values in dotplot do not match GO analysis values

Open lillianhorin opened this issue 2 years ago • 7 comments

Hello! Hope all is well.

I am using both the enrichGO and gseGO functions. The analyses are going well, but I am uncovering a phenomenon I quite honestly don't understand. I am getting low p.adjust values in the analysis, but the p.adjust with dotplot is much higher, even after sorting for p.adjust.

Here is the reproducible example.

library(clusterProfiler)
data(geneList, package="DOSE")
gene <- names(geneList)[abs(geneList) > 2]

ego <- enrichGO(gene          = gene,
                universe      = names(geneList),
                OrgDb         = org.Hs.eg.db,
                ont           = "CC",
                pAdjustMethod = "BH",
                pvalueCutoff  = 0.01,
                qvalueCutoff  = 0.05,
                readable      = TRUE)
head(ego)

image

Immediately upon examining head(ego), we see many p.adjust values as low as 5e-8. But when we plot this using a dotplot, these all seem to disappear. The lowest p.adjust value on the plot is much higher.

dotplot( ego, 
         x = "p.adjust",
         color = "p.adjust",
         showCategory = 20,
         orderBy="p.adjust",
         font.size = 8)

Graphical output: image

The discrepancy is even more noticeable if I sort by default GeneRatio, where values that are supposed to be in the e-8 range have much higher (more red) p.adjust.

image

I have this same issue with gseGO. It seems the issue is once it is being passed into the dotplot function.

Would love your insight on this - thank you so much!

lillianhorin avatar Jul 31 '23 03:07 lillianhorin

I ran the example code you provided, and I indeed got identical results from the enrichGO() analysis:

> library(clusterProfiler)
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
> 
> ego <- enrichGO(gene          = gene,
+                 universe      = names(geneList),
+                 OrgDb         = org.Hs.eg.db,
+                 ont           = "CC",
+                 pAdjustMethod = "BH",
+                 pvalueCutoff  = 0.01,
+                 qvalueCutoff  = 0.05,
+                 readable      = TRUE)
> head(ego)
                   ID                              Description GeneRatio
GO:0005819 GO:0005819                                  spindle    25/201
GO:0000775 GO:0000775           chromosome, centromeric region    19/201
GO:0072686 GO:0072686                          mitotic spindle    17/201
GO:0000779 GO:0000779 condensed chromosome, centromeric region    16/201
GO:0098687 GO:0098687                       chromosomal region    23/201
GO:0000793 GO:0000793                     condensed chromosome    19/201
             BgRatio       pvalue     p.adjust       qvalue
GO:0005819 329/11780 3.448666e-10 5.624574e-08 5.086717e-08
GO:0000775 189/11780 4.988612e-10 5.624574e-08 5.086717e-08
GO:0072686 149/11780 5.941452e-10 5.624574e-08 5.086717e-08
GO:0000779 137/11780 1.372680e-09 9.746031e-08 8.814053e-08
GO:0098687 308/11780 2.593769e-09 1.407498e-07 1.272904e-07
GO:0000793 210/11780 2.973586e-09 1.407498e-07 1.272904e-07
                                                                                                                                                        geneID
GO:0005819 CDCA8/CDC20/KIF23/CENPE/ASPM/DLGAP5/SKA1/NUSAP1/TPX2/TACC3/NEK2/CDK1/MAD2L1/KIF18A/KIF11/TRAT1/TTK/AURKB/PRC1/KIFC1/KIF18B/KIF20A/AURKA/CCNB1/KIF4A
GO:0000775                                      CDCA8/CENPE/NDC80/TOP2A/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/EZH2/TTK/NCAPG/AURKB/CCNB1
GO:0072686                                                KIF23/CENPE/ASPM/SKA1/NUSAP1/TPX2/TACC3/CDK1/MAD2L1/KIF18A/KIF11/TRAT1/AURKB/PRC1/KIFC1/KIF18B/AURKA
GO:0000779                                                       CENPE/NDC80/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/CCNB1
GO:0098687             CDCA8/CENPE/NDC80/TOP2A/HJURP/SKA1/NEK2/CENPM/RAD51AP1/CENPN/CDK1/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/EZH2/TTK/NCAPG/AURKB/CHEK1/CCNB1/MCM5
GO:0000793                                     CENPE/NDC80/TOP2A/NCAPH/HJURP/SKA1/NEK2/CENPM/CENPN/ERCC6L/MAD2L1/KIF18A/CDT1/BIRC5/TTK/NCAPG/AURKB/CHEK1/CCNB1
           Count
GO:0005819    25
GO:0000775    19
GO:0072686    17
GO:0000779    16
GO:0098687    23
GO:0000793    19
>

When I generated the dotplot with showCategory = 20, I also got the same plot (plot not shown).

Yet, when I limited the number of categories to 6 (which is the same number of rows head() returns), I got a plot that clearly matches the values presented in the table. In other words, by setting showCategory = 20 you create a graph that seems to be off (because of the color scale), but which actually isn't??? Please note that 'more red' means lower p.adjust, not higher!

> dotplot( ego, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = 6,
+          orderBy="p.adjust",
+          font.size = 8)
>

image

guidohooiveld avatar Aug 15 '23 11:08 guidohooiveld

@guidohooiveld Thank you for your response! his is very helpful. So does that mean it is not recommended to use a showCategory above 6? Because even if it actually is correct but only seems to be off, the initial perception is everything when presenting the data.

lillianhorin avatar Aug 17 '23 18:08 lillianhorin

Did you solve the issue? I have the same problem.

mail2senthil avatar Jun 18 '24 17:06 mail2senthil

What exactly is your problem? Consider opening a new issue. Be sure to provide sufficient details.

guidohooiveld avatar Jun 18 '24 17:06 guidohooiveld

I have the same problem as reported by [lillianhorin]. The p-value/adjp-value matches between the gsea result and dotplot when I set showCategory=6. I get conflicting values when I set any number above 6.

mail2senthil avatar Jun 18 '24 17:06 mail2senthil

Could you please provide some reproducible code to illustrate the issue?

I am asking because to me it seems everything is correct (i.e. values visualized in the dotplot match with those in ego...)

> ## load required libraries
> library(clusterProfiler)
> library(org.Hs.eg.db)
> 
> ## load included example data, and select the 'relevant' genes.
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
> 
> ## run enrichGO
> ego <- enrichGO(gene          = gene,
+                 universe      = names(geneList),
+                 OrgDb         = org.Hs.eg.db,
+                 ont           = "CC",
+                 pAdjustMethod = "BH",
+                 pvalueCutoff  = 0.01,
+                 qvalueCutoff  = 0.05,
+                 readable      = TRUE)
> 
> ## specifically extract p-, adjusted- and count
> ## for top.n gene sets
> 
> ## set top.n to 9
> top.n = 9
> 
> ego[1:top.n,c("pvalue","p.adjust", "Count")]
                 pvalue     p.adjust Count
GO:0000775 1.422125e-11 4.024615e-09    21
GO:0000779 3.719635e-11 5.138722e-09    18
GO:0072686 9.226677e-11 5.138722e-09    18
GO:0005819 9.727050e-11 5.138722e-09    26
GO:0000793 9.895898e-11 5.138722e-09    21
GO:0000776 1.191495e-10 5.138722e-09    17
GO:0098687 1.271062e-10 5.138722e-09    25
GO:0005876 1.035478e-07 3.663004e-06    10
GO:0005875 4.312652e-07 1.286932e-05    12
> 
> 
> dotplot( ego, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = top.n,
+          orderBy="p.adjust",
+          font.size = 8)
> 

dotplot below matches values above....!

image

 > ## repeat, with e.g. top.n = 14
> top.n = 14
> ego[1:top.n,c("pvalue","p.adjust", "Count")]
                 pvalue     p.adjust Count
GO:0000775 1.422125e-11 4.024615e-09    21
GO:0000779 3.719635e-11 5.138722e-09    18
GO:0072686 9.226677e-11 5.138722e-09    18
GO:0005819 9.727050e-11 5.138722e-09    26
GO:0000793 9.895898e-11 5.138722e-09    21
GO:0000776 1.191495e-10 5.138722e-09    17
GO:0098687 1.271062e-10 5.138722e-09    25
GO:0005876 1.035478e-07 3.663004e-06    10
GO:0005875 4.312652e-07 1.286932e-05    12
GO:0030312 5.002210e-07 1.286932e-05    24
GO:0031012 5.002210e-07 1.286932e-05    24
GO:1990023 5.565242e-07 1.312470e-05     5
GO:0051233 8.312893e-07 1.809653e-05     7
GO:0005874 2.393495e-06 4.838279e-05    19
> 
> 
> dotplot( ego, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = top.n,
+          orderBy="p.adjust",
+          font.size = 8)
> 
>

Again, dotplot below matches values above....!

image

guidohooiveld avatar Jun 18 '24 18:06 guidohooiveld

OK, after re-reading your comment I noticed you are doing a GO-based GSEA analysis...

> res2 <- gseGO(geneList = geneList,
+                ont = "CC",
+                OrgDb = org.Hs.eg.db,
+                keyType = "ENTREZID",
+                minGSSize = 10,
+                maxGSSize = 500,
+                eps = 0,
+                pvalueCutoff = 0.05,
+                pAdjustMethod = "BH"
+                )
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> ## 'Count' now refers to 'the number of core enriched genes'
> ## Add these to res2 data.frame.
> ## Number equals number of forward slashes + 1.
> library(stringr)
> res2@result$Count <- str_count(res2$core_enrichment,"\\/")  + 1
> 
> 
> 
> top.n = 9
> res2[1:top.n,c("Description","pvalue","p.adjust", "Count")]
                                        Description       pvalue     p.adjust Count
GO:0098687                       chromosomal region 7.090676e-23 4.750753e-20    88
GO:0000775           chromosome, centromeric region 1.014018e-20 3.396961e-18    43
GO:0000793                     condensed chromosome 4.510418e-18 1.007327e-15    46
GO:0000779 condensed chromosome, centromeric region 1.064529e-17 1.783086e-15    36
GO:0000228                       nuclear chromosome 3.200622e-17 4.288833e-15    59
GO:0030312         external encapsulating structure 2.071867e-16 1.983073e-14   159
GO:0031012                     extracellular matrix 2.071867e-16 1.983073e-14   159
GO:0000776                              kinetochore 3.294246e-16 2.468340e-14    31
GO:0062023 collagen-containing extracellular matrix 3.315680e-16 2.468340e-14   133
> 
> dotplot( res2, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = top.n,
+          orderBy="p.adjust",
+          font.size = 8)
> 
> 

image

> top.n = 19
> res2[1:top.n,c("Description","pvalue","p.adjust", "Count")]
                                        Description       pvalue     p.adjust Count
GO:0098687                       chromosomal region 7.090676e-23 4.750753e-20    88
GO:0000775           chromosome, centromeric region 1.014018e-20 3.396961e-18    43
GO:0000793                     condensed chromosome 4.510418e-18 1.007327e-15    46
GO:0000779 condensed chromosome, centromeric region 1.064529e-17 1.783086e-15    36
GO:0000228                       nuclear chromosome 3.200622e-17 4.288833e-15    59
GO:0030312         external encapsulating structure 2.071867e-16 1.983073e-14   159
GO:0031012                     extracellular matrix 2.071867e-16 1.983073e-14   159
GO:0000776                              kinetochore 3.294246e-16 2.468340e-14    31
GO:0062023 collagen-containing extracellular matrix 3.315680e-16 2.468340e-14   133
GO:0005819                                  spindle 5.771008e-16 3.866575e-14    68
GO:0072686                          mitotic spindle 1.436490e-11 8.749531e-10    28
GO:0030496                                  midbody 1.033020e-08 5.767697e-07    19
GO:0051233                          spindle midzone 6.674388e-08 3.439877e-06    11
GO:0005604                        basement membrane 8.799662e-08 4.211267e-06    37
GO:0071162                              CMG complex 2.024650e-07 8.478222e-06    11
GO:0031261    DNA replication preinitiation complex 1.933702e-07 8.478222e-06    11
GO:0000940                        outer kinetochore 9.757268e-07 3.845512e-05    10
GO:0005874                              microtubule 1.423733e-06 5.299452e-05    54
GO:0005876                      spindle microtubule 2.721809e-06 9.597956e-05    13
> 
> dotplot( res2, 
+          x = "p.adjust",
+          color = "p.adjust",
+          showCategory = top.n,
+          orderBy="p.adjust",
+          font.size = 8)
> 
> 

image

guidohooiveld avatar Jun 18 '24 19:06 guidohooiveld