clusterProfiler
clusterProfiler copied to clipboard
enrichKEGG issue:the Description column and ID information are identical in 4.10.0
When using version 4.10.0 of clusterProfiler, the output from the enrichKEGG function shows that the Description column and ID information are identical, which did not happen with version 4.6.0. Is this a bug introduced in the newer version? How can it be resolved? The KEGG database used is internally constructed, with the use_internal_data parameter set to TRUE.
x <- enrichKEGG(geneSet, organism = organism, keyType = 'kegg', pvalueCutoff = 0.05, pAdjustMethod = 'BH', minGSSize = 5, maxGSSize = 2000, qvalueCutoff = 0.2, use_internal_data = T)
Note that you are using an old version of clusterProfiler (4.10.0, current is 4.143). Moreover, it seems that your installation uses mixed Bioconductor packages. Note that clusterProfiler v4.10.x is compatible with Bioconductor v3.18.
Let me give some context: when use_internal_data = T, the the KEGG information in the package KEGG.db will be used. This is not recommended, because of license issues re-packaging of the KEGG database downloaded from their FTP site is not allowed anymore for a long time, and therefore the content of KEGG.db has not been updated since many years, That is also the reason that the package KEGG.db finally has been removed from Bioconductor since release 3.12. (link)
Since KEGG.db was removed from Bioconductor 3.12, and clusterProfiler 4.10.0 corresponds to a later Bioconductor release (3.18), I conclude you have somehow mixed up your installation.
Yet, using clusterProfiler 4.10.x, when setting use_internal_data = FALSE, you can still query KEGG through its API, and then everything looks fine to me...
> library(clusterProfiler)
> library(org.Hs.eg.db)
>
> ## load and prepare example data / results
> data(geneList, package="DOSE")
>
> up <- names(geneList)[abs(geneList) > 2]
>
> ## run ORA using GOBP categories
> res.up <- enrichKEGG(gene = up,
+ organism = "hsa",
+ keyType = "kegg",
+ pvalueCutoff = 0.05,
+ pAdjustMethod = "BH",
+ minGSSize = 10,
+ maxGSSize = 500,
+ qvalueCutoff = 0.2,
+ use_internal_data = FALSE)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
>
> res.up <- setReadable(res.up, 'org.Hs.eg.db', keyType = "ENTREZID")
> head(res.up)
category
hsa04110 Cellular Processes
hsa04114 Cellular Processes
hsa04218 Cellular Processes
hsa04061 Environmental Information Processing
hsa03320 Organismal Systems
hsa04814 Cellular Processes
subcategory ID
hsa04110 Cell growth and death hsa04110
hsa04114 Cell growth and death hsa04114
hsa04218 Cell growth and death hsa04218
hsa04061 Signaling molecules and interaction hsa04061
hsa03320 Endocrine system hsa03320
hsa04814 Cell motility hsa04814
Description
hsa04110 Cell cycle
hsa04114 Oocyte meiosis
hsa04218 Cellular senescence
hsa04061 Viral protein interaction with cytokine and cytokine receptor
hsa03320 PPAR signaling pathway
hsa04814 Motor proteins
GeneRatio BgRatio pvalue p.adjust qvalue
hsa04110 15/106 158/8865 4.779149e-10 1.022738e-07 1.001106e-07
hsa04114 10/106 139/8865 5.746555e-06 6.148814e-04 6.018761e-04
hsa04218 10/106 157/8865 1.688773e-05 1.204658e-03 1.179178e-03
hsa04061 8/106 100/8865 2.400030e-05 1.284016e-03 1.256858e-03
hsa03320 7/106 76/8865 3.179854e-05 1.360977e-03 1.332191e-03
hsa04814 10/106 197/8865 1.167349e-04 4.163545e-03 4.075482e-03
geneID
hsa04110 CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13/CCNB1/MCM5/PTTG1
hsa04114 CDC20/CCNB2/CDK1/MAD2L1/CALML5/AURKA/CCNB1/PTTG1/ITPR1/PGR
hsa04218 FOXM1/MYBL2/CCNB2/CCNA2/CDK1/CALML5/CHEK1/CCNB1/CACNA1D/ITPR1
hsa04061 CXCL10/CXCL13/CXCL11/CXCL9/CCL18/CCL8/CXCL14/CX3CR1
hsa03320 MMP1/FADS2/ADIPOQ/PCK1/FABP4/HMGCS2/PLIN1
hsa04814 KIF23/CENPE/KIF18A/KIF11/KIFC1/KIF18B/KIF20A/KIF4A/MYH11/DNALI1
Count
hsa04110 15
hsa04114 10
hsa04218 10
hsa04061 8
hsa03320 7
hsa04814 10
>
> packageVersion("clusterProfiler")
[1] ‘4.10.1’
> BiocManager::version()
[1] ‘3.18’
> sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: Europe/Amsterdam
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] org.Hs.eg.db_3.18.0 AnnotationDbi_1.64.1 IRanges_2.36.0
[4] S4Vectors_0.40.2 Biobase_2.62.0 BiocGenerics_0.48.1
[7] clusterProfiler_4.10.1
loaded via a namespace (and not attached):
[1] DBI_1.2.2 bitops_1.0-7 shadowtext_0.1.3
[4] gson_0.1.0 gridExtra_2.3 rlang_1.1.3
[7] magrittr_2.0.3 DOSE_3.28.2 compiler_4.3.0
[10] RSQLite_2.3.6 png_0.1-8 vctrs_0.6.5
[13] reshape2_1.4.4 stringr_1.5.1 pkgconfig_2.0.3
[16] crayon_1.5.2 fastmap_1.1.1 XVector_0.42.0
[19] ggraph_2.2.1 utf8_1.2.4 HDO.db_0.99.1
[22] enrichplot_1.23.1.992 purrr_1.0.2 bit_4.0.5
[25] zlibbioc_1.48.2 cachem_1.0.8 aplot_0.2.2
[28] GenomeInfoDb_1.38.8 jsonlite_1.8.8 blob_1.2.4
[31] BiocParallel_1.36.0 tweenr_2.0.3 parallel_4.3.0
[34] R6_2.5.1 stringi_1.8.3 RColorBrewer_1.1-3
[37] GOSemSim_2.29.1.001 Rcpp_1.0.12 Matrix_1.6-5
[40] splines_4.3.0 igraph_2.0.3 tidyselect_1.2.1
[43] qvalue_2.34.0 viridis_0.6.5 codetools_0.2-20
[46] lattice_0.22-6 tibble_3.2.1 plyr_1.8.9
[49] treeio_1.26.0 withr_3.0.0 KEGGREST_1.42.0
[52] gridGraphics_0.5-1 scatterpie_0.2.1 polyclip_1.10-6
[55] Biostrings_2.70.3 BiocManager_1.30.22 pillar_1.9.0
[58] ggtree_3.10.1 ggfun_0.1.4 generics_0.1.3
[61] RCurl_1.98-1.14 ggplot2_3.5.0 munsell_0.5.1
[64] scales_1.3.0 tidytree_0.4.6 glue_1.7.0
[67] lazyeval_0.2.2 tools_4.3.0 data.table_1.15.4
[70] fgsea_1.28.0 fs_1.6.3 graphlayouts_1.1.1
[73] fastmatch_1.1-4 tidygraph_1.3.1 cowplot_1.1.3
[76] grid_4.3.0 tidyr_1.3.1 ape_5.7-1
[79] colorspace_2.1-0 nlme_3.1-164 GenomeInfoDbData_1.2.11
[82] patchwork_1.2.0 ggforce_0.4.2 cli_3.6.2
[85] fansi_1.0.6 viridisLite_0.4.2 dplyr_1.1.4
[88] gtable_0.3.4 yulab.utils_0.1.4 digest_0.6.35
[91] ggrepel_0.9.5 ggplotify_0.1.2 farver_2.1.1
[94] memoise_2.0.1 lifecycle_1.0.4 httr_1.4.7
[97] GO.db_3.18.0 bit64_4.0.5 MASS_7.3-60.0.1
>
Thanks!