enrichplot
enrichplot copied to clipboard
Dotplot Circle Size Ambiguous
Dot plot is similar to bar plot with the capability to encode another score as dot size.
Is circle diameter the number of DE genes or number of total genes in pathway after filtering by the gene universe?
I agree that this is not (very well) documented.
According to the help page of dotplot (?dotplot), the default setting of the argument size is NULL. If needed, it can be specified to either be "geneRatio", "Percentage" and "count".
If not specified (i.e. size = NULL), checking the source code shows that the 'value' of size is depending on x; thus on which values are plotted on the x-axis:
https://github.com/YuLab-SMU/enrichplot/blob/0ca2fcf18a4178e28ea8578d9b21e471448ef5fa/R/dotplot.R#L148-L171
Since for the plotting of almost all types of analyses the default is x = "GeneRatio", size thus is set to Count.
In the next chunk of code the value of Count is calculated through the function fortify. It then becomes clear that:
For an ORA, Count equals the number of foreground genes (for example genes present in a list of DEG) that are annotated to that gene set / pathway.
For a GSEA, Count equals the number of leading edge (a.k.a. core enriched) genes of that gene set / pathway.
https://github.com/YuLab-SMU/enrichplot/blob/0ca2fcf18a4178e28ea8578d9b21e471448ef5fa/R/dotplot.R#L173-L181
Some code to demonstrate this:
> library(clusterProfiler)
> library(ggplot2) ## for function fortify
>
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
>
> ORA.kegg <- enrichKEGG(gene = gene,
+ organism = 'hsa',
+ pvalueCutoff = 0.05)
>
>
>
> GSEA.kegg <- gseKEGG(geneList = geneList,
+ organism = 'hsa',
+ eps = 0,
+ pvalueCutoff = 0.05)
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
leading edge analysis...
done...
>
>
>
> ## line 178 of the source code
> ## https://github.com/YuLab-SMU/enrichplot/blob/0ca2fcf18a4178e28ea8578d9b21e471448ef5fa/R/dotplot.R#L178
> ## set additional arguments
> showCategory=3
> split = NULL
>
> ## for ORA result
> fortify(ORA.kegg, showCategory = showCategory, split=split)
category subcategory ID Description
hsa04110 Cellular Processes Cell growth and death hsa04110 Cell cycle
hsa04114 Cellular Processes Cell growth and death hsa04114 Oocyte meiosis
hsa04218 Cellular Processes Cell growth and death hsa04218 Cellular senescence
GeneRatio BgRatio RichFactor FoldEnrichment zScore pvalue
hsa04110 0.14150943 0.01779680 0.09493671 7.951397 9.691235 4.684809e-10
hsa04114 0.09433962 0.01565668 0.07194245 6.025519 6.564387 5.673603e-06
hsa04218 0.09433962 0.01768416 0.06369427 5.334695 6.023681 1.667772e-05
p.adjust qvalue
hsa04110 1.007234e-07 9.862756e-08
hsa04114 6.099123e-04 5.972213e-04
hsa04218 1.195237e-03 1.170366e-03
geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114 991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218 2305/4605/9133/890/983/51806/1111/891/776/3708
Count
hsa04110 15
hsa04114 10
hsa04218 10
>
> ## check value for Count for set 'Cell cycle', that has id hsa04110
> fortify(ORA.kegg, showCategory = showCategory, split=split)["hsa04110","Count"]
[1] 15
>
> dotplot(ORA.kegg)
>
> ## for GSEA result
> fortify(GSEA.kegg, showCategory = showCategory, split=split)
ID Description setSize enrichmentScore NES
hsa04110 hsa04110 Cell cycle 139 0.6637551 2.850548
hsa03050 hsa03050 Proteasome 43 0.7094784 2.489242
hsa04657 hsa04657 IL-17 signaling pathway 85 0.5622094 2.205005
pvalue p.adjust qvalue rank
hsa04110 5.567944e-21 1.926509e-18 1.435943e-18 1155
hsa03050 1.425077e-08 2.465384e-06 1.837600e-06 2516
hsa04657 2.226086e-08 2.567419e-06 1.913652e-06 2880
leading_edge
hsa04110 tags=36%, list=9%, signal=33%
hsa03050 tags=65%, list=20%, signal=52%
hsa04657 tags=49%, list=23%, signal=38%
core_enrichment
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232/4171/993/990/5347/701/9700/898/23594/4998/9134/4175/4173/10926/6502/994/699/4609/5111/26271/1869/1029/8317/4176/2810/3066/1871/1031/9088/995/1019/4172/5885/11200/7027/1875
hsa03050 5688/5709/5698/5693/3458/5713/11047/5721/5691/5685/5690/5684/5686/5695/10213/23198/7979/5699/5714/5702/5708/5692/5704/5683/5694/5718/51371/5682
hsa04657 4312/6280/6279/6278/3627/2921/6364/8061/4318/3576/3934/6347/727897/1051/6354/3458/6361/6374/2919/9618/5603/7128/1994/7124/3569/8772/5743/7186/3596/6356/5594/4792/9641/1147/2932/6300/5597/27190/1432/7184/64806/3326
Count .sign GeneRatio
hsa04110 50 activated 0.3597122
hsa03050 28 activated 0.6511628
hsa04657 42 activated 0.4941176
>
> ## check value for Count for set 'Ribosome biogenesis in eukaryotes', that has id hsa03008
> fortify(GSEA.kegg, showCategory = 10, split=split)["hsa03008","Count"]
[1] 41
>
>
> dotplot(GSEA.kegg, split=".sign") + facet_grid(.~.sign)
>