ReactomePA icon indicating copy to clipboard operation
ReactomePA copied to clipboard

Difference between Kobas-i, Reactome website and ReactomePA (may caused by duplicated background genes)

Open Freya-Cui-2020 opened this issue 2 years ago • 0 comments

Hello,

I have 482 ensembl genes (411 tranformed into entrz using bitr) to perform Reactome pathway gene erichment analysis.

I used the ReactomePA and kobas-i at the same time, with the q value <0.1, I got 7 pathways by kobas-i

ID                                    Description GeneRatio     Bg       pvalue   p.adjust

1 R-HSA-3700989 Transcriptional_Regulation_by_TP53 19/482 356 5.228496e-06 0.00383667 2 R-HSA-74160 Gene_expression_(Transcription) 43/482 1448 4.022863e-05 0.02108555 3 R-HSA-1362409 Mitochondrial_iron-sulfur_cluster_biogenesis 4/482 11 6.167911e-05 0.02828758 4 R-HSA-73857 RNA_Polymerase_II_Transcription 38/482 1316 1.990795e-04 0.06086855 5 R-HSA-212436 Generic_Transcription_Pathway 35/482 1193 2.684548e-04 0.07035433 6 R-HSA-5689896 Ovarian_tumor_domain_proteases 5/482 38 4.633071e-04 0.09443742 7 R-HSA-2426168 Activation_of_gene_expression_by_SREBF_(SREBP) 5/482 40 5.736862e-04 0.09567521

The ReactomePA gave 4 pathways with q value <0.5 ID Description GeneRatio BgRatio pvalue p.adjust R-HSA-1362409 R-HSA-1362409 Mitochondrial iron-sulfur cluster biogenesis 4/215 13/10856 9.296164e-05 0.0487148 R-HSA-3700989 R-HSA-3700989 Transcriptional Regulation by TP53 19/215 365/10856 1.153013e-04 0.0487148 R-HSA-5689896 R-HSA-5689896 Ovarian tumor domain proteases 5/215 38/10856 8.571709e-04 0.2414365 R-HSA-2426168 R-HSA-2426168 Activation of gene expression by SREBF (SREBP) 5/215 42/10856 1.362514e-03 0.2878311

The R-HSA-74160, R-HSA-73857 and R-HSA-212436 were not calculated in the analysis by ReactomePA. At the meantime, I had the same enrichment results as kobas-i using the reactome website. To find reasons, I checked three aspects:

First, I checked if the pathway exist in the reactome.db.

get("R-HSA-74160", reactomePATHID2NAME) [1] "Homo sapiens: Gene expression (Transcription)" get("R-HSA-212436", reactomePATHID2NAME) [1] "Homo sapiens: Generic Transcription Pathway" get("R-HSA-73857",reactomePATHID2NAME) [1] "Homo sapiens: RNA Polymerase II Transcription"

Then, I excluded the possiblity that the changes caused by the gene ID transformation from ENSEMBL to ENTRZ

(df[2,8]%>%strsplit("\|"))[[1]] %in% (entrz2$ENSEMBL%>%as.vector())%>%table() FALSE TRUE 1 42 There were 42 enriched genes in ENTRZ ID

Third, I checked the background gene numbers in reactome.db

length(get("R-HSA-74160", reactomePATHID2EXTID)) [1] 1837 length(get("R-HSA-74160", reactomePATHID2EXTID)%>%unique()) [1] 1506 It seemed that the background genes are duplicated.

My question is: I suspected the difference were caused by the duplicate genes in reactome.db. How to avoid this? I wanted to draw the cneplot of reactome enrichment results by kobas-i, if the duplicated problem could not be solved, how can I achieved the drawing purpose?

Attached is my R sessionInfo:

R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 [2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 [4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages: [1] parallel stats4 stats graphics [5] grDevices utils datasets methods
[9] base

other attached packages: [1] reactome.db_1.76.0 graphite_1.38.0
[3] org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1 [5] IRanges_2.26.0 S4Vectors_0.30.2
[7] Biobase_2.52.0 BiocGenerics_0.38.0
[9] ReactomePA_1.36.0 clusterProfiler_4.0.5 [11] ggplot2_3.3.5

loaded via a namespace (and not attached): [1] fgsea_1.18.0
[2] colorspace_2.0-2
[3] ggtree_3.0.4
[4] ellipsis_0.3.2
[5] qvalue_2.24.0
[6] XVector_0.32.0
[7] aplot_0.1.1
[8] rstudioapi_0.13
[9] farver_2.1.0
[10] graphlayouts_0.7.1
[11] ggrepel_0.9.1
[12] bit64_4.0.5
[13] fansi_0.5.0
[14] scatterpie_0.1.7
[15] splines_4.1.1
[16] cachem_1.0.6
[17] GOSemSim_2.18.1
[18] polyclip_1.10-0
[19] jsonlite_1.7.2
[20] GO.db_3.13.0
[21] png_0.1-7
[22] graph_1.70.0
[23] ggforce_0.3.3
[24] BiocManager_1.30.16
[25] compiler_4.1.1
[26] httr_1.4.2
[27] backports_1.2.1
[28] assertthat_0.2.1
[29] Matrix_1.3-4
[30] fastmap_1.1.0
[31] lazyeval_0.2.2
[32] tweenr_1.0.2
[33] tools_4.1.1
[34] igraph_1.2.6
[35] gtable_0.3.0
[36] glue_1.4.2
[37] GenomeInfoDbData_1.2.6 [38] reshape2_1.4.4
[39] DO.db_2.9
[40] dplyr_1.0.7
[41] rappdirs_0.3.3
[42] fastmatch_1.1-3
[43] Rcpp_1.0.7
[44] enrichplot_1.12.3
[45] vctrs_0.3.8
[46] Biostrings_2.60.2
[47] ape_5.5
[48] nlme_3.1-153
[49] ggraph_2.0.5
[50] stringr_1.4.0
[51] lifecycle_1.0.1
[52] DOSE_3.18.3
[53] zlibbioc_1.38.0
[54] MASS_7.3-54
[55] scales_1.1.1
[56] tidygraph_1.2.0
[57] RColorBrewer_1.1-2
[58] curl_4.3.2
[59] memoise_2.0.0
[60] gridExtra_2.3
[61] downloader_0.4
[62] ggfun_0.0.4
[63] yulab.utils_0.0.4
[64] stringi_1.7.5
[65] RSQLite_2.2.8
[66] tidytree_0.3.5
[67] checkmate_2.0.0
[68] BiocParallel_1.26.2
[69] GenomeInfoDb_1.28.4
[70] rlang_0.4.11
[71] pkgconfig_2.0.3
[72] bitops_1.0-7
[73] lattice_0.20-45
[74] purrr_0.3.4
[75] labeling_0.4.2
[76] treeio_1.16.2
[77] patchwork_1.1.1
[78] cowplot_1.1.1
[79] shadowtext_0.0.9
[80] bit_4.0.4
[81] tidyselect_1.1.1
[82] plyr_1.8.6
[83] magrittr_2.0.1
[84] R6_2.5.1
[85] generics_0.1.0
[86] DBI_1.1.1
[87] pillar_1.6.3
[88] withr_2.4.2
[89] KEGGREST_1.32.0
[90] RCurl_1.98-1.5
[91] tibble_3.1.4
[92] crayon_1.4.1
[93] utf8_1.2.2
[94] viridis_0.6.2
[95] grid_4.1.1
[96] data.table_1.14.2
[97] blob_1.2.2
[98] digest_0.6.28
[99] tidyr_1.1.4
[100] gridGraphics_0.5-1
[101] munsell_0.5.0
[102] viridisLite_0.4.0
[103] ggplotify_0.1.0

sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] reactome.db_1.76.0 graphite_1.38.0 org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1 [5] IRanges_2.26.0 S4Vectors_0.30.2 Biobase_2.52.0 BiocGenerics_0.38.0
[9] ReactomePA_1.36.0 clusterProfiler_4.0.5 ggplot2_3.3.5

loaded via a namespace (and not attached): [1] fgsea_1.18.0 colorspace_2.0-2 ggtree_3.0.4 ellipsis_0.3.2
[5] qvalue_2.24.0 XVector_0.32.0 aplot_0.1.1 rstudioapi_0.13
[9] farver_2.1.0 graphlayouts_0.7.1 ggrepel_0.9.1 bit64_4.0.5
[13] fansi_0.5.0 scatterpie_0.1.7 splines_4.1.1 cachem_1.0.6
[17] GOSemSim_2.18.1 polyclip_1.10-0 jsonlite_1.7.2 GO.db_3.13.0
[21] png_0.1-7 graph_1.70.0 ggforce_0.3.3 BiocManager_1.30.16
[25] compiler_4.1.1 httr_1.4.2 backports_1.2.1 assertthat_0.2.1
[29] Matrix_1.3-4 fastmap_1.1.0 lazyeval_0.2.2 tweenr_1.0.2
[33] tools_4.1.1 igraph_1.2.6 gtable_0.3.0 glue_1.4.2
[37] GenomeInfoDbData_1.2.6 reshape2_1.4.4 DO.db_2.9 dplyr_1.0.7
[41] rappdirs_0.3.3 fastmatch_1.1-3 Rcpp_1.0.7 enrichplot_1.12.3
[45] vctrs_0.3.8 Biostrings_2.60.2 ape_5.5 nlme_3.1-153
[49] ggraph_2.0.5 stringr_1.4.0 lifecycle_1.0.1 DOSE_3.18.3
[53] zlibbioc_1.38.0 MASS_7.3-54 scales_1.1.1 tidygraph_1.2.0
[57] RColorBrewer_1.1-2 curl_4.3.2 memoise_2.0.0 gridExtra_2.3
[61] downloader_0.4 ggfun_0.0.4 yulab.utils_0.0.4 stringi_1.7.5
[65] RSQLite_2.2.8 tidytree_0.3.5 checkmate_2.0.0 BiocParallel_1.26.2
[69] GenomeInfoDb_1.28.4 rlang_0.4.11 pkgconfig_2.0.3 bitops_1.0-7
[73] lattice_0.20-45 purrr_0.3.4 labeling_0.4.2 treeio_1.16.2
[77] patchwork_1.1.1 cowplot_1.1.1 shadowtext_0.0.9 bit_4.0.4
[81] tidyselect_1.1.1 plyr_1.8.6 magrittr_2.0.1 R6_2.5.1
[85] generics_0.1.0 DBI_1.1.1 pillar_1.6.3 withr_2.4.2
[89] KEGGREST_1.32.0 RCurl_1.98-1.5 tibble_3.1.4 crayon_1.4.1
[93] utf8_1.2.2 viridis_0.6.2 grid_4.1.1 data.table_1.14.2
[97] blob_1.2.2 digest_0.6.28 tidyr_1.1.4 gridGraphics_0.5-1
[101] munsell_0.5.0 viridisLite_0.4.0 ggplotify_0.1.0

Freya-Cui-2020 avatar Oct 18 '21 14:10 Freya-Cui-2020