clusterProfiler
clusterProfiler copied to clipboard
Mapping ENTREZID to something in OrgDB other than SYMBOL in clusterProfiler
Hello Dr. Guangchuang Yu,
Using cnetplot() or any other clusterProfiler plotting functions, can you change the keytype of the OrgDb that is read when readable = TRUE, or when using the setReadable() function? The OrgDb that I am using has limited information in the SYMBOL keytype. Usually this keytype just contains the gene locus ID. I would rather map my gene IDs to the keytype GENENAME, if possible. How would you suggest accomplishing this?
Thank you for your assistance.
Here is a bit of code which exemplifies my problem:
library("AnnotationHub")
ah <- AnnotationHub()
query(ah, "pisum")
orgdb <- ah[["AH66769"]]
library("clusterProfiler")
library("DOSE")
library("enrichplot")
library("pathview")
# gene universe, keytype = ENTREZID
# this is all genes from A. pisum
uni <- keys(orgdb, keytype = "ENTREZID")
# gene subset, keytype = ENTREZID
# this is a subset of all genes from A. pisum
goi <- uni[1:2000]
# GO enrichment
ego <- enrichGO(gene = goi, universe = uni, OrgDb = orgdb,
ont = "MF", pAdjustMethod = "fdr", pvalueCutoff = 0.05,
qvalueCutoff = 0.05, keyType = "ENTREZID")
head(ego)
# plot results using cnetplot
cnetplot(ego)
# all node labels are the ENTREZIDs that were the original input
# try again with readable = TRUE
egoT <- enrichGO(gene = goi, universe = uni, OrgDb = orgdb,
ont = "MF", pAdjustMethod = "fdr", pvalueCutoff = 0.05,
qvalueCutoff = 0.05, keyType = "ENTREZID", readable = TRUE)
cnetplot(egoT)
# most node labels are gene locus numbers, even with readable = TRUE
# look at the SYMBOL keys in orgdb
symbols <- select(orgdb, keys = goi, keytype = "ENTREZID", columns = "SYMBOL")
head(symbols)
# how many are locus numbers?
sum(grepl("LOC", symbols$SYMBOL))
# how many are not locus numbers?
sum(!grepl("LOC", symbols$SYMBOL))
# can I use GENENAME instead of SYMBOL for cnetplot?
genenames <- select(orgdb, keys = goi, keytype = "ENTREZID", columns = "GENENAME")
head(genenames)
some OrgDb do not even have a SYMBOL - for example Yeast (SacCer) org.Sc.sgd.db - causing this condition to trip.
Is there a recommended workaround in such cases?
Perhaps a way to set the SYMBOL of the OrgDb to be the same as GENENAME or COMMON for the duration of the analysis?
I am having the same issue.
I guess the SGD gene ids are 'ORF' in org.Sc.sgd.db
-- that should be verified by someone who knows better -- in any case, this is what I am trying to do:
ggo <- groupGO(gene = gene_list,
OrgDb = org.Sc.sgd.db,
ont = "BP",
keyType = "ORF",
level = 3,
readable = TRUE)
per the vignette, just to see what happens. In this case, an error occurs:
Error in .testForValidCols(x, cols) :
Invalid columns: SYMBOL. Please use the columns method to see a listing of valid arguments.
In addition: Warning message:
In setReadable(x, OrgDb) :
Fail to convert input geneID to SYMBOL since no SYMBOL information available in the provided OrgDb...
> gene_list
[1] "YNR067C" "YIL162W" "YDL223C" "YPL119C" "YMR084W" "YLR168C" "YHR139C" "YDR033W" "YER056C" "YOR136W" "YDL022W"
[12] "YPR157W" "YJL163C" "YER130C" "YDL204W" "YKL050C" "YNL037C" "YDL039C" "YNL115C" "YML128C" "YAL038W" "YMR053C"
[23] "YDL085W" "YDR222W" "YNCH0011W" "YPR158W" "YNL327W" "YMR085W" "YDR070C" "YOR095C" "YGR088W" "YLR044C" "YLR164W"
[34] "YDL218W" "YJL045W" "YIR014W" "YGR142W" "YAL012W" "YPL223C" "YMR322C" "YIR038C" "YEL030W" "YBR018C" "YPR192W"
[45] "YGL008C" "YPL036W" "YNL124W" "YER124C" "YGR052W" "YJR094C" "YGR239C" "YGR201C" "YOR338W" "YKL107W" "YDL222C"
[56] "YBR033W" "YBL069W" "YER062C" "YBR117C" "YKR076W" "YCL026C-B" "YPL058C" "YEL069C" "YGR256W" "YOR306C" "YPR074C"
[67] "YBL043W" "YBR007C" "YCL025C" "YMR175W-A" "YGR287C" "YBR296C" "YGR236C" "YKL065W-A" "YMR118C" "YMR175W" "YHR046C"
[78] "YDL037C" "YMR174C" "YGR138C" "YGL205W" "YDR256C" "YOR186W" "YHR092C" "YOL136C" "YDR380W" "YAL037W" "YIL160C"
[89] "YIL057C" "YOL124C" "YLL017W" "YCR010C" "YBR238C" "YPL282C" "YOL014W"
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] clusterProfiler_4.2.2 org.Sc.sgd.db_3.14.0 here_1.0.1 forcats_0.5.1
[5] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.1.2
[9] tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
[13] topGO_2.46.0 SparseM_1.81 GO.db_3.14.0 AnnotationDbi_1.56.2
[17] graph_1.72.0 DESeq2_1.34.0 SummarizedExperiment_1.24.0 MatrixGenerics_1.6.0
[21] matrixStats_0.61.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0
[25] S4Vectors_0.32.3 GeneTonic_1.6.1 Biobase_2.54.0 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 AnnotationForge_1.36.0 pkgmaker_0.32.2 bit64_4.0.5 knitr_1.37
[6] DelayedArray_0.20.0 data.table_1.14.2 KEGGREST_1.34.0 RCurl_1.98-1.5 doParallel_1.0.17
[11] generics_0.1.2 RSQLite_2.2.9 shadowtext_0.1.1 bit_4.0.4 tzdb_0.2.0
[16] enrichplot_1.14.1 webshot_0.5.2 xml2_1.3.3 lubridate_1.8.0 httpuv_1.6.5
[21] assertthat_0.2.1 viridis_0.6.2 xfun_0.29 hms_1.1.1 jquerylib_0.1.4
[26] evaluate_0.14 promises_1.2.0.1 TSP_1.1-11 fansi_1.0.2 progress_1.2.2
[31] dendextend_1.15.2 dbplyr_2.1.1 readxl_1.3.1 Rgraphviz_2.38.0 igraph_1.2.11
[36] DBI_1.1.2 geneplotter_1.72.0 htmlwidgets_1.5.4 ellipsis_0.3.2 crosstalk_1.2.0
[41] backports_1.4.1 annotate_1.72.0 gridBase_0.4-7 biomaRt_2.50.3 vctrs_0.3.8
[46] cachem_1.0.6 withr_2.4.3 ggforce_0.3.3 treeio_1.18.1 prettyunits_1.1.1
[51] cluster_2.1.2 DOSE_3.20.1 ape_5.6-1 backbone_2.0.0 lazyeval_0.2.2
[56] crayon_1.4.2 genefilter_1.76.0 pkgconfig_2.0.3 tweenr_1.0.2 nlme_3.1-155
[61] seriation_1.3.1 rlang_1.0.1 lifecycle_1.0.1 miniUI_0.1.1.1 colourpicker_1.1.1
[66] downloader_0.4 registry_0.5-1 filelock_1.0.2 BiocFileCache_2.2.1 GOstats_2.60.0
[71] modelr_0.1.8 cellranger_1.1.0 rprojroot_2.0.2 polyclip_1.10-0 rngtools_1.5.2
[76] Matrix_1.4-0 aplot_0.1.2 reprex_2.0.1 base64enc_0.1-3 GlobalOptions_0.1.2
[81] pheatmap_1.0.12 png_0.1-7 viridisLite_0.4.0 rjson_0.2.21 bitops_1.0-7
[86] shinydashboard_0.7.2 visNetwork_2.1.0 Biostrings_2.62.0 blob_1.2.2 shape_1.4.6
[91] rintrojs_0.3.0 qvalue_2.26.0 gridGraphics_0.5-1 shinyAce_0.4.1 scales_1.1.1
[96] memoise_2.0.1 GSEABase_1.56.0 magrittr_2.0.2 plyr_1.8.6 zlibbioc_1.40.0
[101] threejs_0.3.3 scatterpie_0.1.7 compiler_4.1.2 RColorBrewer_1.1-2 clue_0.3-60
[106] cli_3.1.1 XVector_0.34.0 Category_2.60.0 patchwork_1.1.1 MASS_7.3-55
[111] tidyselect_1.1.1 stringi_1.7.6 shinyBS_0.61 GOSemSim_2.20.0 locfit_1.5-9.4
[116] ggrepel_0.9.1 grid_4.1.2 sass_0.4.0 fastmatch_1.1-3 tools_4.1.2
[121] parallel_4.1.2 circlize_0.4.13 rstudioapi_0.13 foreach_1.5.2 gridExtra_2.3
[126] farver_2.1.0 ggraph_2.0.5 digest_0.6.29 BiocManager_1.30.16 shiny_1.7.1
[131] Rcpp_1.0.8 broom_0.7.12 later_1.3.0 shinyWidgets_0.6.4 httr_1.4.2
[136] ComplexHeatmap_2.10.0 colorspace_2.0-2 rvest_1.0.2 XML_3.99-0.8 fs_1.5.2
[141] splines_4.1.2 tippy_0.1.0 yulab.utils_0.0.4 RBGL_1.70.0 tidytree_0.3.7
[146] expm_0.999-6 graphlayouts_0.8.0 ggplotify_0.1.0 plotly_4.10.0 xtable_1.8-4
[151] ggtree_3.2.1 jsonlite_1.7.3 pcaExplorer_2.20.1 heatmaply_1.3.0 dynamicTreeCut_1.63-1
[156] tidygraph_1.2.0 ggfun_0.0.5 R6_2.5.1 pillar_1.7.0 htmltools_0.5.2
[161] mime_0.12 NMF_0.23.0 glue_1.6.1 fastmap_1.1.0 DT_0.20
[166] BiocParallel_1.28.3 bs4Dash_2.0.3 codetools_0.2-18 fgsea_1.20.0 utf8_1.2.2
[171] lattice_0.20-45 bslib_0.3.1 curl_4.3.2 survival_3.2-13 limma_3.50.0
[176] rmarkdown_2.11 munsell_0.5.0 DO.db_2.9 GetoptLong_1.0.5 GenomeInfoDbData_1.2.7
[181] iterators_1.0.14 haven_2.4.3 reshape2_1.4.4 gtable_0.3.0 shinycssloaders_1.0.0
Bump - I just hit this issue myself.
I create my own AnnotationDbi packages (if interested see makeBioconductorAnnotationDbi) since the species I work with don't have one so my workaround is just to add SYMBOL to the package. It would be nice to resolve this issue though.
Still a problem for org.Sc.sgd.db, which has no SYMBOL information