enrichplot
enrichplot copied to clipboard
Inclusion and exclusion criteria used by dotplot
Hi,
I encountered an issue where some pathways that were passed the significant cutoffs (adjusted p<0.05 and qvalue<0.2) were not included in the dotplot. Below includes the dotplot and the top pathways returned by function enrichKEGG. Any inputs on how to address this issue are greatly appreciated. Thank you!
Did you check the help page of the function dotplot?
Type ?dotplot
The default number of categories to be plotted is 10, if you would like to plot more categories (sets), increase the value of the argument showCategory (from 10).
Thank you for the input, guidohooiveld. I understand that the number of categories to be plotted by default is 10 and it can be changed. My question is how it determines the top 10 (or 20) categories, and it does seems to be the adjusted p-value, qvalue or generatio. For example, the barplot uses the adjusted p-value to rank the categories
If you run a gene set analysis, clusterProfiler filters these sets based on significance, and then orders them according to the value of p.adjust. See e.g. these threads (which you also found, I just saw):
https://github.com/YuLab-SMU/clusterProfiler/issues/710#issuecomment-2244623389
https://github.com/YuLab-SMU/enrichplot/issues/270#issuecomment-2012958275
Within the function dotplot, the argument x defines which metric to use to 'order' the gene sets (number as defined by showCategory) on the x-axis of the plot. Hopefully you have seen that the default value is x = "GeneRatio". If you would like to use another metric, you should change x. x can be any of colnames(as.data.frame(res)) (see below for code).
> ## load library
> library(clusterProfiler)
> library(enrichplot)
>
> ## load sample data
> data(geneList, package="DOSE")
> gene <- names(geneList)[abs(geneList) > 2]
>
> ## perform enrichment analysis
> ## calculate adjusted p-values according to BH,
> ## and set pvalueCutoff to 0.05.
> res <- enrichKEGG(gene = gene,
+ organism = 'hsa',
+ pvalueCutoff = 0.05,
+ pAdjustMethod = "BH")
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
>
> ## which columns/metrics are available?
> colnames(as.data.frame(res))
[1] "category" "subcategory" "ID" "Description"
[5] "GeneRatio" "BgRatio" "RichFactor" "FoldEnrichment"
[9] "zScore" "pvalue" "p.adjust" "qvalue"
[13] "geneID" "Count"
>
> ## default dotplot (x = "GeneRatio")
> dotplot(res)
>
> ## order on (for example) zScore
> dotplot(res, x="zScore")
>
> ## order on p.adjust (=FDR)
> dotplot(res, x="p.adjust")
>
> ## note that using such small values for plotting
> ## doesn't look nice from an aesthetics point of view.
> ## Therefore manually add column with -log(p.adjust)
> ## to results.
>
> res@result$minlog10p.adj <- -log10(res@result$p.adjust)
>
> dotplot(res, x="minlog10p.adj")
>
> ## To make nice x-axis label
> library(ggplot2)
> dotplot(res, x="minlog10p.adj") + xlab(expression( -log['10']*(FDR) ))
>
Thank you for the detailed explanation. However, if you compare the dotplot and the table I attached, I wonder why the first pathway in the table, Pathways of neurodegeneration - multiple diseases, was not included in the dotplot. Its generatio is higher than the first pathway in the dotplot, Human papillomavirus infection, and its adjusted pvalue also passed the cutoff of 0.05.
I find your table difficult to read....
Can you paste the output from this line of code? as.data.frame(res)
Using the result object from my analysis:
> as.data.frame(res)
category
hsa04110 Cellular Processes
hsa04114 Cellular Processes
hsa04218 Cellular Processes
hsa04061 Environmental Information Processing
hsa03320 Organismal Systems
hsa04814 Cellular Processes
hsa04914 Organismal Systems
subcategory ID
hsa04110 Cell growth and death hsa04110
hsa04114 Cell growth and death hsa04114
hsa04218 Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320 Endocrine system hsa03320
hsa04814 Cell motility hsa04814
hsa04914 Endocrine system hsa04914
Description
hsa04110 Cell cycle
hsa04114 Oocyte meiosis
hsa04218 Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320 PPAR signaling pathway
hsa04814 Motor proteins
hsa04914 Progesterone-mediated oocyte maturation
GeneRatio BgRatio RichFactor FoldEnrichment zScore pvalue
hsa04110 15/106 158/8546 0.09493671 7.654048 9.460695 7.860673e-10
hsa04114 10/106 139/8546 0.07194245 5.800190 6.394153 7.902735e-06
hsa04218 10/106 157/8546 0.06369427 5.135200 5.860423 2.306908e-05
hsa04061 8/106 100/8546 0.08000000 6.449811 6.143186 3.110376e-05
hsa03320 7/106 76/8546 0.09210526 7.425770 6.305623 4.007301e-05
hsa04814 10/106 197/8546 0.05076142 4.092520 4.921145 1.571086e-04
hsa04914 7/106 111/8546 0.06306306 5.084311 4.853728 4.367132e-04
p.adjust qvalue
hsa04110 1.713627e-07 1.696251e-07
hsa04114 8.613982e-04 8.526636e-04
hsa04218 1.676353e-03 1.659355e-03
hsa04061 1.695155e-03 1.677966e-03
hsa03320 1.747183e-03 1.729467e-03
hsa04814 5.708278e-03 5.650395e-03
hsa04914 1.360050e-02 1.346259e-02
geneID
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232
hsa04114 991/9133/983/4085/51806/6790/891/9232/3708/5241
hsa04218 2305/4605/9133/890/983/51806/1111/891/776/3708
hsa04061 3627/10563/6373/4283/6362/6355/9547/1524
hsa03320 4312/9415/9370/5105/2167/3158/5346
hsa04814 9493/1062/81930/3832/3833/146909/10112/24137/4629/7802
hsa04914 9133/890/983/4085/6790/891/5241
Count minlog10p.adj
hsa04110 15 6.766084
hsa04114 10 3.064796
hsa04218 10 2.775635
hsa04061 8 2.770791
hsa03320 7 2.757662
hsa04814 10 2.243495
hsa04914 7 1.866445
>
Sorry, it would be better to show the output from the 3 lines of code below.
Note that I save the dotplot to object p. This has the advantage that all relevant information is present, which allows much easy viewing!
> p <- dotplot(res)
> p$data[order(p$data$GeneRatio, decreasing = TRUE)[1:10], ]
> print(p)
Output: (note that 3x NA are present (because only 7 sets are significant, but the first 10 should be returned...)
category subcategory ID
hsa04110 Cellular Processes Cell growth and death hsa04110
hsa04114 Cellular Processes Cell growth and death hsa04114
hsa04218 Cellular Processes Cell growth and death hsa04218
hsa04814 Cellular Processes Cell motility hsa04814
hsa04061 Environmental Information Processing Signaling molecules and interaction hsa04061
hsa03320 Organismal Systems Endocrine system hsa03320
hsa04914 Organismal Systems Endocrine system hsa04914
NA <NA> <NA> <NA>
NA.1 <NA> <NA> <NA>
NA.2 <NA> <NA> <NA>
Description GeneRatio BgRatio
hsa04110 Cell cycle 0.14150943 0.018488182
hsa04114 Oocyte meiosis 0.09433962 0.016264919
hsa04218 Cellular senescence 0.09433962 0.018371168
hsa04814 Motor proteins 0.09433962 0.023051720
hsa04061 Viral protein interaction with cytokine and cytokine receptor 0.07547170 0.011701381
hsa03320 PPAR signaling pathway 0.06603774 0.008893049
hsa04914 Progesterone-mediated oocyte maturation 0.06603774 0.012988533
NA <NA> NA NA
NA.1 <NA> NA NA
NA.2 <NA> NA NA
RichFactor FoldEnrichment zScore pvalue p.adjust qvalue
hsa04110 0.09493671 7.654048 9.460695 7.860673e-10 1.713627e-07 1.696251e-07
hsa04114 0.07194245 5.800190 6.394153 7.902735e-06 8.613982e-04 8.526636e-04
hsa04218 0.06369427 5.135200 5.860423 2.306908e-05 1.676353e-03 1.659355e-03
hsa04814 0.05076142 4.092520 4.921145 1.571086e-04 5.708278e-03 5.650395e-03
hsa04061 0.08000000 6.449811 6.143186 3.110376e-05 1.695155e-03 1.677966e-03
hsa03320 0.09210526 7.425770 6.305623 4.007301e-05 1.747183e-03 1.729467e-03
hsa04914 0.06306306 5.084311 4.853728 4.367132e-04 1.360050e-02 1.346259e-02
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
geneID Count
hsa04110 8318/991/9133/10403/890/983/4085/81620/7272/9212/1111/9319/891/4174/9232 15
hsa04114 991/9133/983/4085/51806/6790/891/9232/3708/5241 10
hsa04218 2305/4605/9133/890/983/51806/1111/891/776/3708 10
hsa04814 9493/1062/81930/3832/3833/146909/10112/24137/4629/7802 10
hsa04061 3627/10563/6373/4283/6362/6355/9547/1524 8
hsa03320 4312/9415/9370/5105/2167/3158/5346 7
hsa04914 9133/890/983/4085/6790/891/5241 7
NA <NA> NA
NA.1 <NA> NA
NA.2 <NA> NA
minlog10p.adj x
hsa04110 6.766084 0.14150943
hsa04114 3.064796 0.09433962
hsa04218 2.775635 0.09433962
hsa04814 2.243495 0.09433962
hsa04061 2.770791 0.07547170
hsa03320 2.757662 0.06603774
hsa04914 1.866445 0.06603774
NA NA NA
NA.1 NA NA
NA.2 NA NA
>
I tried the data used in your example code. There is no issue and I got the same dotplot. However, for the project I am working on, pathways included in the dotplot does not match with the results returned by the enrichKEGG function.
Below Kegg_enrich is an object returned by function enrichKEGG.
if I do the following, the data frame will match with dotplot simply because object p only includes data used in generating the dotplot. p <- dotplot(Kegg_enrich) p$data[order(p$data$GeneRatio, decreasing = TRUE)[1:10], c("Description","GeneRatio","Count","p.adjust","qvalue")]
However, if we check the results in kegg_enrich (below), we can notice that Pathways of neurodegeneration - multiple diseases has the highest GeneRatio and the adjusted p-value is <0.05 but was not included in the dotplot.
kegg_enrich@result[order(kegg_enrich@result$Count,decreasing = T)[1:10],c("Description","GeneRatio","Count","p.adjust","qvalue")]
Would you be able to share the object Kegg_enrich? By doing something like save(Kegg_enrich, file="kegg.results.RData")
I can share it if you can let me know your email.
Mail address provided.