enrichplot icon indicating copy to clipboard operation
enrichplot copied to clipboard

Dotplot Circle Size Ambiguous

Open DarioS opened this issue 11 months ago • 1 comments

Dot plot is similar to bar plot with the capability to encode another score as dot size.

Is circle diameter the number of DE genes or number of total genes in pathway after filtering by the gene universe?

DarioS avatar Dec 12 '24 19:12 DarioS

I agree that this is not (very well) documented.

According to the help page of dotplot (?dotplot), the default setting of the argument size is NULL. If needed, it can be specified to either be "geneRatio", "Percentage" and "count".

If not specified (i.e. size = NULL), checking the source code shows that the 'value' of size is depending on x; thus on which values are plotted on the x-axis:

https://github.com/YuLab-SMU/enrichplot/blob/0ca2fcf18a4178e28ea8578d9b21e471448ef5fa/R/dotplot.R#L148-L171

Since for the plotting of almost all types of analyses the default is x = "GeneRatio", size thus is set to Count.

In the next chunk of code the value of Count is calculated through the function fortify. It then becomes clear that:

For an ORA, Count equals the number of foreground genes (for example genes present in a list of DEG) that are annotated to that gene set / pathway.

For a GSEA, Count equals the number of leading edge (a.k.a. core enriched) genes of that gene set / pathway.

https://github.com/YuLab-SMU/enrichplot/blob/0ca2fcf18a4178e28ea8578d9b21e471448ef5fa/R/dotplot.R#L173-L181

Some code to demonstrate this:


> library(clusterProfiler)
> library(ggplot2) ## for function fortify
> 
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
> 
> ORA.kegg <- enrichKEGG(gene         = gene,
+                        organism     = 'hsa',
+                        pvalueCutoff = 0.05)
> 
> 
> 
> GSEA.kegg <- gseKEGG(geneList     = geneList,
+                      organism     = 'hsa',
+                      eps          = 0,
+                      pvalueCutoff = 0.05)
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
> 
> 
> 
> ## line 178 of the source code
> ## https://github.com/YuLab-SMU/enrichplot/blob/0ca2fcf18a4178e28ea8578d9b21e471448ef5fa/R/dotplot.R#L178
> ## set additional arguments
> showCategory=3
> split = NULL
> 

> ## for ORA result
> fortify(ORA.kegg, showCategory = showCategory, split=split)
                   category           subcategory       ID         Description
hsa04110 Cellular Processes Cell growth and death hsa04110          Cell cycle
hsa04114 Cellular Processes Cell growth and death hsa04114      Oocyte meiosis
hsa04218 Cellular Processes Cell growth and death hsa04218 Cellular senescence
          GeneRatio    BgRatio RichFactor FoldEnrichment   zScore       pvalue
hsa04110 0.14150943 0.01779680 0.09493671       7.951397 9.691235 4.684809e-10
hsa04114 0.09433962 0.01565668 0.07194245       6.025519 6.564387 5.673603e-06
hsa04218 0.09433962 0.01768416 0.06369427       5.334695 6.023681 1.667772e-05
             p.adjust       qvalue
hsa04110 1.007234e-07 9.862756e-08
hsa04114 6.099123e-04 5.972213e-04
hsa04218 1.195237e-03 1.170366e-03
                                                                           geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114                          991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218                           2305/4605/9133/890/983/51806/1111/891/776/3708
         Count
hsa04110    15
hsa04114    10
hsa04218    10
> 
> ## check value for Count for set 'Cell cycle', that has id hsa04110
> fortify(ORA.kegg, showCategory = showCategory, split=split)["hsa04110","Count"]
[1] 15
> 
> dotplot(ORA.kegg)
>

image

> ## for GSEA result
> fortify(GSEA.kegg, showCategory = showCategory, split=split)
               ID             Description setSize enrichmentScore      NES
hsa04110 hsa04110              Cell cycle     139       0.6637551 2.850548
hsa03050 hsa03050              Proteasome      43       0.7094784 2.489242
hsa04657 hsa04657 IL-17 signaling pathway      85       0.5622094 2.205005
               pvalue     p.adjust       qvalue rank
hsa04110 5.567944e-21 1.926509e-18 1.435943e-18 1155
hsa03050 1.425077e-08 2.465384e-06 1.837600e-06 2516
hsa04657 2.226086e-08 2.567419e-06 1.913652e-06 2880
                           leading_edge
hsa04110  tags=36%, list=9%, signal=33%
hsa03050 tags=65%, list=20%, signal=52%
hsa04657 tags=49%, list=23%, signal=38%
                                                                                                                                                                                                                                              core_enrichment
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa03050                                                                                                      5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198/7979/5699/5714/5702/5708/5692/5704/5683/5694/5718/51371/5682
hsa04657                                4312/6280/6279/6278/3627/2921/6364/8061/4318/3576/3934/6347/727897/1051/6354/3458/6361/6374/2919/9618/5603/7128/1994/7124/3569/8772/5743/7186/3596/6356/5594/4792/9641/1147/2932/6300/5597/27190/1432/7184/64806/3326
         Count     .sign GeneRatio
hsa04110    50 activated 0.3597122
hsa03050    28 activated 0.6511628
hsa04657    42 activated 0.4941176
> 
> ## check value for Count for set 'Ribosome biogenesis in eukaryotes', that has id hsa03008
> fortify(GSEA.kegg, showCategory = 10, split=split)["hsa03008","Count"]
[1] 41
> 
> 
> dotplot(GSEA.kegg, split=".sign") + facet_grid(.~.sign)
> 

image

guidohooiveld avatar Dec 16 '24 11:12 guidohooiveld