clusterProfiler
clusterProfiler copied to clipboard
dotplot() error after compareCluster() for fun = "groupGO" with data(gcSample)
Describe your issue
Even using the example from the documentation I am getting errors:
> xx <- enrichplot::pairwise_termsim(xx)
> clusterProfiler::dotplot(xx)
> xx <- compareCluster(gcSample,
+ fun = "groupGO",
+ OrgDb= 'org.Hs.eg.db',
+ ont = "BP",
+ readable = TRUE)
> clusterProfiler::dotplot(xx)
Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `compute_aesthetics()`:
! Aesthetics are not valid data columns.
✖ The following aesthetics are invalid:
✖ `colour = p.adjust`
ℹ Did you mistype the name of a data column or forget to add `after_stat()`?
Run `rlang::last_trace()` to see where the error occurred.
But enrichKEGG and enrichGO works as expected
data(gcSample)
xx <- compareCluster(gcSample, fun="enrichKEGG",
organism="hsa", pvalueCutoff=0.05)
xx <- enrichplot::pairwise_termsim(xx)
clusterProfiler::dotplot(xx)
Warning messages:
1: In utils::download.file(url, quiet = TRUE, method = method, ...) :
the 'wininet' method is deprecated for http:// and https:// URLs
2: In utils::download.file(url, quiet = TRUE, method = method, ...) :
the 'wininet' method is deprecated for http:// and https:// URLs
sessioninfo:
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8 LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 GenomicFeatures_1.50.4
[3] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[5] clusterProfiler_4.6.2 org.Mm.eg.db_3.16.0
[7] AnnotationDbi_1.60.2 IRanges_2.32.0
[9] S4Vectors_0.36.2 Biobase_2.58.0
[11] BiocGenerics_0.44.0
loaded via a namespace (and not attached):
[1] shadowtext_0.1.2 fastmatch_1.1-3 BiocFileCache_2.6.1 plyr_1.8.8
[5] igraph_1.4.2 lazyeval_0.2.2 splines_4.2.2 BiocParallel_1.32.6
[9] pathview_1.38.0 ggplot2_3.4.2 digest_0.6.31 yulab.utils_0.0.6
[13] GOSemSim_2.24.0 viridis_0.6.2 GO.db_3.16.0 fansi_1.0.4
[17] magrittr_2.0.3 memoise_2.0.1 Biostrings_2.66.0 graphlayouts_0.8.4
[21] matrixStats_0.63.0 enrichplot_1.18.3 prettyunits_1.1.1 colorspace_2.1-0
[25] blob_1.2.4 rappdirs_0.3.3 ggrepel_0.9.3 dplyr_1.1.2
[29] crayon_1.5.2 RCurl_1.98-1.12 jsonlite_1.8.4 graph_1.76.0
[33] scatterpie_0.1.8 ape_5.7-1 glue_1.6.2 polyclip_1.10-4
[37] gtable_0.3.3 zlibbioc_1.44.0 XVector_0.38.0 DelayedArray_0.24.0
[41] graphite_1.44.0 Rgraphviz_2.42.0 scales_1.2.1 DOSE_3.24.2
[45] futile.options_1.0.1 DBI_1.1.3 Rcpp_1.0.10 viridisLite_0.4.1
[49] progress_1.2.2 gridGraphics_0.5-1 tidytree_0.4.2 bit_4.0.5
[53] reactome.db_1.82.0 httr_1.4.5 fgsea_1.24.0 RColorBrewer_1.1-3
[57] pkgconfig_2.0.3 XML_3.99-0.14 farver_2.1.1 dbplyr_2.3.2
[61] utf8_1.2.3 ggplotify_0.1.0 tidyselect_1.2.0 labeling_0.4.2
[65] rlang_1.1.0 reshape2_1.4.4 munsell_0.5.0 tools_4.2.2
[69] cachem_1.0.7 downloader_0.4 cli_3.6.0 generics_0.1.3
[73] RSQLite_2.3.0 gson_0.1.0 stringr_1.5.0 fastmap_1.1.1
[77] yaml_2.3.7 ggtree_3.6.2 org.Hs.eg.db_3.16.0 bit64_4.0.5
[81] tidygraph_1.2.3 purrr_1.0.1 KEGGREST_1.38.0 ggraph_2.1.0
[85] ReactomePA_1.42.0 nlme_3.1-160 formatR_1.14 KEGGgraph_1.58.3
[89] aplot_0.1.10 xml2_1.3.3 biomaRt_2.54.1 compiler_4.2.2
[93] rstudioapi_0.14 filelock_1.0.2 curl_5.0.0 png_0.1-8
[97] treeio_1.22.0 tibble_3.2.1 tweenr_2.0.2 stringi_1.7.12
[101] futile.logger_1.4.3 lattice_0.21-8 Matrix_1.5-3 vctrs_0.6.0
[105] pillar_1.9.0 lifecycle_1.0.3 data.table_1.14.8 cowplot_1.1.1
[109] bitops_1.0-7 patchwork_1.1.2 rtracklayer_1.58.0 qvalue_2.30.0
[113] R6_2.5.1 BiocIO_1.8.0 gridExtra_2.3 codetools_0.2-18
[117] lambda.r_1.2.4 MASS_7.3-58.3 SummarizedExperiment_1.28.0 rjson_0.2.21
[121] withr_2.5.0 GenomicAlignments_1.34.1 Rsamtools_2.14.0 GenomeInfoDbData_1.2.9
[125] parallel_4.2.2 hms_1.1.3 VennDiagram_1.7.3 grid_4.2.2
[129] ggfun_0.0.9 tidyr_1.3.0 HDO.db_0.99.1 MatrixGenerics_1.10.0
[133] ggforce_0.4.1 restfulr_0.0.15
Please note that you misinterpreted what groupGO is doing, and therefore you should achieve your goal slightly different!
Having said this, regarding the error: reason for it is that the results from the function groupGO indeed lack the column p.adjust. As a consequence, information required for plotting is lacking, hence the error.
Using the groupGO example code:
> data(gcSample)
> yy <- groupGO(gcSample[[1]], 'org.Hs.eg.db', ont="BP", level=2)
> yy
GO BP Profiles at level 2 of 216 Homo sapiens genes
> head(as.data.frame(yy))
ID Description Count GeneRatio
GO:0000003 GO:0000003 reproduction 15 15/216
GO:0002376 GO:0002376 immune system process 54 54/216
GO:0008152 GO:0008152 metabolic process 119 119/216
GO:0009987 GO:0009987 cellular process 190 190/216
GO:0016032 GO:0016032 viral process 10 10/216
GO:0022414 GO:0022414 reproductive process 15 15/216
geneID
GO:0000003 5266/2175/993/653/6665/2709/3024/2529/5467/10733/7216/8549/5156/59272/51314
GO:0002376 5266/2175/5871/5167/119/7850/653/6364/671/8685/2840/8581/5319/3559/6541/1137/2208/8326/2921/923/3868/10411/22861/29851/2529/259197/10663/3576/2533/2047/961/5074/10614/2813/6597/6279/1521/8970/6374/25893/1281/926/28823/4589/50852/23580/79148/80380/53347/26279/9464/79931/51311/51176
GO:0008152 4597/5266/2175/3931/6770/993/229/5871/4771/5167/3757/7850/653/51442/671/5080/3899/8685/3101/2005/3827/8292/6665/9355/11184/5319/7136/1608/8710/1137/2208/8092/3595/10001/30848/4857/4115/9863/8814/9942/64211/7368/3024/923/333/6317/22861/8707/4109/3159/2529/11283/2780/5467/3697/8817/10733/3576/7216/8514/2047/10209/6625/58487/961/5074/10614/23185/2915/23321/199731/6597/6279/586/1521/23109/8123/10631/25893/140883/1281/2016/5156/699/28823/139081/26085/862/50852/23390/3046/56649/10324/79148/59272/79852/80380/11182/29881/64180/79725/53347/26279/51302/9464/7761/2001/9496/79931/80339/11173/80020/51311/51378/59336/53637/51176/5317/2018
GO:0009987 4597/7111/2175/755/23046/3931/6770/993/229/55800/10232/5871/4771/2173/27239/5167/3757/119/7850/51678/653/6364/51442/671/366/5080/3899/9369/8685/3101/2005/10874/3827/8292/6665/9355/2709/2840/8581/11184/5319/3559/7136/1608/8710/2558/22808/6541/1137/2208/8092/28513/3595/10001/6564/30848/4857/4115/8326/9863/8814/2921/9942/64211/7368/3024/923/333/6317/3868/10411/22861/8707/4109/9362/29851/3159/2529/27077/27180/11283/2780/5467/8793/4902/8828/259197/8817/10733/10663/3576/491/9568/7216/8514/2533/3752/2047/10209/6625/58487/23148/221692/961/8549/5074/10614/23185/2915/23321/199731/2813/6597/6279/586/4308/2676/8970/3755/1183/23109/23114/8123/146691/81569/6374/10631/25893/140883/1281/2016/5156/926/699/22798/6251/667/56926/23092/28823/139081/26085/26707/862/4589/50852/23390/3046/23580/54801/56649/10324/57348/64221/51458/29767/5101/79148/59272/23630/79852/80380/11182/54768/29881/29967/79725/54798/51314/53347/26279/51302/9464/7761/80258/2001/9496/79931/80339/11173/80020/29119/51311/51378/59336/53637/26692/51176/5317/2018
GO:0016032 6541/6317/10663/3576/10614/6597/25893/56649/59272/51176
GO:0022414 5266/2175/993/653/6665/2709/3024/2529/5467/10733/7216/8549/5156/59272/51314
>
Note that the column p.adjust is indeed missing!
Yet, based on the description of the function groupGO (open the help page: ?groupGO, also check here) it is doing something else than you expect it would be doing;
-
For each input gene, the function
groupGOextracts the annotated GO terms at a specific level, and that's it. -
You rather expected that it would perform an over-representation analysis for those level-specific terms.
In order to achieve your goal you should first perform a enrichGO analysis using the annotation at all levels. Next, you apply the function gofilter to subset the results at a specific annotation level.
Please note that @GuangchuangYu, the author of clusterProfiler does not recommend the use of gofilter for data interpretation, but rather the function simplify. See: https://github.com/YuLab-SMU/clusterProfiler/issues/30
Anyway:
> data(gcSample)
> xx <- compareCluster(gcSample,
+ fun = "enrichGO",
+ OrgDb= 'org.Hs.eg.db',
+ ont = "BP",
+ readable = TRUE,
+ pvalueCutoff=0.05)
>
> xx
#
# Result of Comparing 8 gene clusters
#
#.. @fun enrichGO
#.. @geneClusters List of 8
$ X1: chr [1:216] "4597" "7111" "5266" "2175" ...
$ X2: chr [1:805] "23450" "5160" "7126" "26118" ...
$ X3: chr [1:392] "894" "7057" "22906" "3339" ...
$ X4: chr [1:838] "5573" "7453" "5245" "23450" ...
$ X5: chr [1:929] "5982" "7318" "6352" "2101" ...
$ X6: chr [1:585] "5337" "9295" "4035" "811" ...
$ X7: chr [1:582] "2621" "2665" "5690" "3608" ...
$ X8: chr [1:237] "2665" "4735" "1327" "3192" ...
#...Result 'data.frame': 1002 obs. of 10 variables:
$ Cluster : Factor w/ 8 levels "X1","X2","X3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID : chr "GO:0021978" "GO:0086005" "GO:0086091" "GO:1990266" ...
$ Description: chr "telencephalon regionalization" "ventricular cardiac muscle cell action potential" "regulation of heart rate by cardiac conduction" "neutrophil migration" ...
$ GeneRatio : chr "4/198" "5/198" "5/198" "8/198" ...
$ BgRatio : chr "13/18870" "35/18870" "38/18870" "129/18870" ...
$ pvalue : num 7.81e-06 3.04e-05 4.58e-05 6.58e-05 6.59e-05 ...
$ p.adjust : num 0.0207 0.0322 0.0322 0.0322 0.0322 ...
$ qvalue : num 0.0191 0.0297 0.0297 0.0297 0.0297 ...
$ geneID : chr "PAX6/LHX2/EMX1/EMX2" "SCN3B/KCNH2/KCND3/KCNE5/CTNNA3" "SCN3B/KCNH2/KCND3/KCNE5/CTNNA3" "CCL20/PLA2G1B/CXCL3/FUT7/CXCL8/GP2/S100A8/CXCL5" ...
$ Count : int 4 5 5 8 9 7 4 3 12 9 ...
#.. number of enriched terms found for each gene cluster:
#.. X1: 15
#.. X2: 290
#.. X3: 84
#.. X4: 97
#.. X5: 111
#.. X6: 97
#.. X7: 138
#.. X8: 170
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou,
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
> gofilter(xx, level = 4)
#
# Result of Comparing 8 gene clusters
#
#.. @fun enrichGO
#.. @geneClusters List of 8
$ X1: chr [1:216] "4597" "7111" "5266" "2175" ...
$ X2: chr [1:805] "23450" "5160" "7126" "26118" ...
$ X3: chr [1:392] "894" "7057" "22906" "3339" ...
$ X4: chr [1:838] "5573" "7453" "5245" "23450" ...
$ X5: chr [1:929] "5982" "7318" "6352" "2101" ...
$ X6: chr [1:585] "5337" "9295" "4035" "811" ...
$ X7: chr [1:582] "2621" "2665" "5690" "3608" ...
$ X8: chr [1:237] "2665" "4735" "1327" "3192" ...
#...Result 'data.frame': 158 obs. of 10 variables:
$ Cluster : Factor w/ 8 levels "X1","X2","X3",..: 1 1 1 2 2 2 2 2 2 2 ...
$ ID : chr "GO:0061351" "GO:0021543" "GO:0097529" "GO:0051304" ...
$ Description: chr "neural precursor cell proliferation" "pallium development" "myeloid leukocyte migration" "chromosome separation" ...
$ GeneRatio : chr "9/198" "9/198" "10/198" "17/757" ...
$ BgRatio : chr "166/18870" "191/18870" "242/18870" "81/18870" ...
$ pvalue : num 6.59e-05 1.92e-04 2.47e-04 1.75e-08 2.42e-08 ...
$ p.adjust : num 3.22e-02 4.37e-02 4.37e-02 1.27e-05 1.53e-05 ...
$ qvalue : num 2.97e-02 4.03e-02 4.03e-02 1.05e-05 1.28e-05 ...
$ geneID : chr "NF2/PAX6/LHX2/FZD9/LHX5/EPHB1/EMX1/LEF1/EMX2" "NF2/PAX6/LHX2/LHX5/PHACTR1/COL3A1/EMX1/LEF1/EMX2" "CCL20/PLA2G1B/CXCL3/FUT7/CXCL8/CD47/GP2/S100A8/CXCL5/MMP28" "CUL3/NCAPD2/PLSCR1/CDC20/APC/BUB1B/DLGAP5/SMC2/ESPL1/KNTC1/BUB1/BIRC5/NCAPD3/ZWILCH/NCAPG/NCAPG2/SPDL1" ...
$ Count : int 9 9 10 17 37 42 31 6 37 22 ...
#.. number of enriched terms found for each gene cluster:
#.. X1: 3
#.. X2: 51
#.. X3: 15
#.. X4: 13
#.. X5: 27
#.. X6: 14
#.. X7: 14
#.. X8: 21
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou,
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
> simplify(xx)
#
# Result of Comparing 8 gene clusters
#
#.. @fun enrichGO
#.. @geneClusters List of 8
$ X1: chr [1:216] "4597" "7111" "5266" "2175" ...
$ X2: chr [1:805] "23450" "5160" "7126" "26118" ...
$ X3: chr [1:392] "894" "7057" "22906" "3339" ...
$ X4: chr [1:838] "5573" "7453" "5245" "23450" ...
$ X5: chr [1:929] "5982" "7318" "6352" "2101" ...
$ X6: chr [1:585] "5337" "9295" "4035" "811" ...
$ X7: chr [1:582] "2621" "2665" "5690" "3608" ...
$ X8: chr [1:237] "2665" "4735" "1327" "3192" ...
#...Result 'data.frame': 529 obs. of 10 variables:
$ Cluster : Factor w/ 8 levels "X1","X2","X3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ ID : chr "GO:0021978" "GO:0086005" "GO:0086091" "GO:1990266" ...
$ Description: chr "telencephalon regionalization" "ventricular cardiac muscle cell action potential" "regulation of heart rate by cardiac conduction" "neutrophil migration" ...
$ GeneRatio : chr "4/198" "5/198" "5/198" "8/198" ...
$ BgRatio : chr "13/18870" "35/18870" "38/18870" "129/18870" ...
$ pvalue : num 7.81e-06 3.04e-05 4.58e-05 6.58e-05 6.59e-05 ...
$ p.adjust : num 0.0207 0.0322 0.0322 0.0322 0.0322 ...
$ qvalue : num 0.0191 0.0297 0.0297 0.0297 0.0297 ...
$ geneID : chr "PAX6/LHX2/EMX1/EMX2" "SCN3B/KCNH2/KCND3/KCNE5/CTNNA3" "SCN3B/KCNH2/KCND3/KCNE5/CTNNA3" "CCL20/PLA2G1B/CXCL3/FUT7/CXCL8/GP2/S100A8/CXCL5" ...
$ Count : int 4 5 5 8 9 7 3 12 9 3 ...
#.. number of enriched terms found for each gene cluster:
#.. X1: 11
#.. X2: 148
#.. X3: 51
#.. X4: 65
#.. X5: 71
#.. X6: 44
#.. X7: 63
#.. X8: 76
#
#...Citation
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou,
W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
The Innovation. 2021, 2(3):100141
>
> dotplot( gofilter(xx, level = 4), showCategory = 5, font.size = 6 )
> dotplot( simplify(xx), showCategory = 5, font.size = 6 )
Lastly, in this context it may be good to know that the function dropGO can be used to drop GO terms or a specific level of GO terms. (gofilter restricts the results to a specific level).
> dotplot( dropGO(xx, level = 4), showCategory = 5, font.size = 6 )