enrichplot
enrichplot copied to clipboard
How to show labels for several selected genes
Hi,I tried to get my plot and I found that there were too many gene labels in the graph. I know I can solve this problem by just reducing the gene nodes, but I'm wondering if it is possible to draw all the gene nodes and to choose just some of them to show their labels? Looking forward for your reply.
By knowing that the plots generated by enrichplot are ggplot2-objects, this is relatively straight-forward to do; you can just add/remove layers to/from the plot.
In order to fully reproduce the plots, one should obviously also know the values of the relevant arguments that are taken into account when generating the plot. You can find these on the source code.
Assuming you would like to modify a cnetplot, these are the relevant lines in the function cnetplot:
https://github.com/YuLab-SMU/enrichplot/blob/591b7d1dd4c55c7f79446fe30265c78a64af0540/R/cnetplot.R#L328-L332
The unexposed function add_node_label is defined in emapplot_utilities.R:
https://github.com/YuLab-SMU/enrichplot/blob/591b7d1dd4c55c7f79446fe30265c78a64af0540/R/emapplot_utilities.R#L460
Note the use of geom_node_text (link) when checking the full source code of add_node_label, and to deduce the values to be used for the arguments of gem_node_text, check the code of add_node_label as well.
Below some code to demonstrate proof-of-concept.
> ## Load required libraries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> library(ggraph) ## for function 'geom_node_text'
>
> ## Load example data
> data(geneList, package="DOSE")
>
> ## Goal: in a cnetplot, plot labels for only these 3 genes
> genes2plot <- c("AURKA", "CDC20", "CENPE")
>
> ## Which KEGG pathways are over-represented in a
> ## kist of genes?
> de <- names(geneList)[1:100]
> x <- enrichKEGG(de)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> x <- setReadable(x, "org.Hs.eg.db", "ENTREZID")
> x2 <- pairwise_termsim(x)
>
> ## Set number of gene sets to plot.
> ## This number (also!) corresponds to the first lines
> ## in the data slot of the plotted object.
> ##
> ## Note: default value = 5.
> showCategory <- 5
>
> ## Create cnetplot.
> ## note that all genes are plotted!
> p1 <- cnetplot(x2, showCategory=showCategory)
>
> print(p1)
>
> ## Check which layer contains the gene symbols
> ## Answer: layer 3.
> p1$layers
[[1]]
mapping: edge_alpha = ~I(alpha), x = ~x, y = ~y, xend = ~xend, yend = ~yend, group = ~edge.id
geom_edge_path: arrow = NULL, lineend = butt, linejoin = round, linemitre = 1, interpolate = FALSE, label_colour = black, label_alpha = 1, label_parse = FALSE, check_overlap = FALSE, angle_calc = rot, force_flip = TRUE, label_dodge = NULL, label_push = NULL, na.rm = FALSE
stat_edge_link: n = 100, na.rm = FALSE
position_identity
[[2]]
mapping: colour = ~I(color), size = ~size, alpha = ~I(alpha), x = ~x, y = ~y
geom_point: na.rm = FALSE
stat_filter: na.rm = FALSE
position_identity
[[3]]
mapping: label = ~name, x = ~x, y = ~y
geom_text_repel: parse = FALSE, na.rm = FALSE
stat_filter: na.rm = FALSE
position_identity
>
> ## Remove layer 3 from p1
> p1$layers[[3]] <- NULL
>
> ## Check; indeed no labels are plotted!
> print(p1)
>
> ## Replace removed layer by new one in which only selected genes are plotted.
> ## Note that the first rows in the data slot of the graph (i.e. p$data)
> ## correspond to the number of gene sets to be shown in the plot,
> ## thus to the value of the argument 'showCategory'.
> ## This is important, because gene set nodes should always be plotted,
> ## although their names do not match with the content of 'genes2plot'.
> ## Hence, for subsetting p1$data, names of gene sets and genes2plot need to be concatenated
> ## in the line below [= c(p1$data$name[1:showCategory],genes2plot) ].
>
> ## Check data slot of object
> p1$data
# A tibble: 33 × 9
x y name size color .ggraph.orig_index circular .ggraph.index
<dbl> <dbl> <chr> <dbl> <chr> <int> <lgl> <int>
1 -0.964 1.56 Cell c… 12 #E5C… 1 FALSE 1
2 -1.19 -1.33 Cellul… 7 #E5C… 2 FALSE 2
3 -2.76 -0.0747 Oocyte… 6 #E5C… 3 FALSE 3
4 2.77 -2.83 Motor … 7 #E5C… 4 FALSE 4
5 3.74 2.19 IL-17 … 5 #E5C… 5 FALSE 5
6 0.0479 2.84 CDC45 3 #B3B… 6 FALSE 6
7 -2.72 1.46 CDC20 3 #B3B… 7 FALSE 7
8 -1.03 -0.0836 CCNB2 3 #B3B… 8 FALSE 8
9 -0.610 3.14 NDC80 3 #B3B… 9 FALSE 9
10 -1.84 0.141 CCNA2 3 #B3B… 10 FALSE 10
# ℹ 23 more rows
# ℹ 1 more variable: alpha <dbl>
# ℹ Use `print(n = ...)` to see more rows
>
> ## Subset data slot, and prepare new plot.
> p2 <- p1 + geom_node_text(data=p1$data[which( p1$data$name %in% c(p1$data$name[1:showCategory],genes2plot) ),],
+ aes(x=x, y=y, label=name), color='black', bg.color="white",
+ segment.size=0.5, repel=TRUE, size=5)
>
> print(p2)
>
> ## Done!
>
This was a great explanation and it also works nicely.
Is there also a way to modify the plot to remove all the edges and nodes of the genes I don't want to label?
Or make them transparent?
library(ggtangle)
set.seed(123)
x <- list(A = letters[1:10], B=letters[5:12], C=letters[sample(1:26, 15)])
cnetplot(x, node_label = letters[1:5])
we get:
If you want to do the subsetting.
options(cnetplot_subset = TRUE)
cnetplot(x, node_label = letters[1:5])
You may also interested in https://github.com/YuLab-SMU/enrichplot/issues/194 and https://github.com/YuLab-SMU/enrichplot/issues/253.
All these currently need github version of ggtangle.
Not sure how it should work.
i installed ggtangle version 0.0.4.003, but when I run your code example I get an error -
> library(ggtangle)
> set.seed(123)
> x <- list(A = letters[1:10], B=letters[5:12], C=letters[sample(1:26, 15)])
>
> cnetplot(x, node_label = letters[1:5])
Error in match.arg(node_label, c("category", "gene", "all", "none")) :
'arg' must be of length 1
How do I make it recognize the correct parameters?
EDIT:
Ok, I know I need to use the ggtangle::cneplot, but How do I use this with an enrichResult object?
sorry about that. I'll continue here, as I think it has more to do with this package.
Unfortunately it doesn't work as it suppose to do.
When I try this command:
enrichplot::cnetplot(ego.BP, color.params = list(foldChange = sig.genes, edge = TRUE), node_label = names(sig.genes)[1:5],
showCategory = go_of_interest$Name, # layout = 'circle',
cex.params = list(gene_node = 0.5, gene_label = 0.5)
)
(> names(sig.genes)[1:5] => [1] "IL13RA2" "COMP" "RFLNA" "TGM2" "CCL2" )
I get the following error:
Error in match.arg(node_label, c("category", "gene", "all", "none")) :
'arg' must be of length 1
In what order should I load the two packages?
Your enrichplot is out-of-date which internally used ggraph instead of ggtangle.
For any new feature, the first thing you need to do is make sure all versions are up to date.
thanks. Now it works nicely
Just chiming in with some issues and 6 (small) questions regarding fine-tuning of the new functionality...
See below for code + questions.
> ## Load required libraries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
>
> ## Load example data
> data(geneList, package="DOSE")
>
> ## 3 genes randomly selected
> genes2plot <- c("AURKA", "CDC20", "CENPE")
>
> ## Which KEGG pathways are over-represented in a
> ## list of selected genes?
> de <- names(geneList)[1:100]
> x <- enrichKEGG(de)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> x <- setReadable(x, "org.Hs.eg.db", "ENTREZID")
> x2 <- pairwise_termsim(x)
>
> x2
#
# over-representation test
#
#...@organism hsa
#...@ontology KEGG
#...@keytype ENTREZID
#...@gene chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
#...pvalues adjusted by 'BH' with cutoff <0.05
#...9 enriched terms found
'data.frame': 9 obs. of 14 variables:
$ category : chr "Cellular Processes" "Cellular Processes" "Cellular Processes" "Cellular Processes" ...
$ subcategory : chr "Cell growth and death" "Cell growth and death" "Cell growth and death" "Cell motility" ...
$ ID : chr "hsa04110" "hsa04218" "hsa04114" "hsa04814" ...
$ Description : chr "Cell cycle" "Cellular senescence" "Oocyte meiosis" "Motor proteins" ...
$ GeneRatio : chr "12/58" "7/58" "6/58" "7/58" ...
$ BgRatio : chr "158/8768" "157/8768" "139/8768" "197/8768" ...
$ RichFactor : num 0.0759 0.0446 0.0432 0.0355 0.0526 ...
$ FoldEnrichment: num 11.48 6.74 6.53 5.37 7.96 ...
$ zScore : num 10.85 5.92 5.36 5.06 5.56 ...
$ pvalue : num 3.36e-10 7.21e-05 2.93e-04 2.96e-04 3.91e-04 ...
$ p.adjust : num 4.07e-08 4.36e-03 8.96e-03 8.96e-03 9.46e-03 ...
$ qvalue : num 3.75e-08 4.02e-03 8.26e-03 8.26e-03 8.73e-03 ...
$ geneID : chr "CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13" "FOXM1/MYBL2/CCNB2/CCNA2/CDK1/CALML5/CHEK1" "CDC20/CCNB2/CDK1/MAD2L1/CALML5/AURKA" "KIF23/CENPE/KIF18A/KIF11/KIFC1/KIF18B/KIF20A" ...
$ Count : int 12 7 6 7 5 5 5 4 6
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, doi:10.1038/s41596-024-01020-z
>
> ## 1) Create default cnetplot.
> ## note that all genes are plotted!
> p1 <- cnetplot(x2)
>
> print(p1)
>
> ## 2) Create cnetplot in which 3 gene sets are shown,
> ## and only the 3 genes are labelled.
> ## also color nodes based on fold change.
> p2 <- enrichplot::cnetplot(x2, showCategory = c("Cell cycle", "Motor proteins", "Oocyte meiosis"),
+ node_label=genes2plot , foldChange = geneList)
>
> print(p2)
>
Q1: the labels of the gene sets are somehow missing in p2... Can this be fixed?
Q2: the legend contains many [=7] dots (size ranging from 6-12). Could the number of dots in the legend be reduced? To (say) only 3 or 4??
> ## 3) Same as 2, but remove the edges from the genes that are not labelled.
> ## Note the use of options() to enable this.
> options(cnetplot_subset = TRUE)
> p3 <- enrichplot::cnetplot(x2, showCategory = c("Cell cycle", "Motor proteins", "Oocyte meiosis"),
+ node_label=genes2plot , foldChange = geneList)
>
> print(p3)
>
Q3: in p3 the dots in the legend contain fractions, but that number rather should be integers only?
Q4: Again, reduce the number of dots shown in the legend? From 5 to 2?
Q5: Also, the numbers in the legend should rather reflect the number of genes 'initially belonging' to the gene set, and NOT to the number of genes that are finally kept in the plot? [Thus, for example, for set "Motor proteins" size should (always) be 7 instead of 1]? See content of x2 in first code chunk.
Note: labels gene sets are now plotted?!
## 4) Same as 3, but use different layout. (default is: igraph::layout_nicely).
## Also increase (multiply) size of gene sets nodes (by using size_category() ).
p4 <- enrichplot::cnetplot(x2, layout = igraph::layout_as_tree, showCategory = c("Cell cycle", "Motor proteins", "Oocyte meiosis"),
size_category = 2, node_label=genes2plot, foldChange = geneList)
print(p4)
> ## 5) Same as 2, but use different layout. (default is: igraph::layout_nicely)
> ## note that cnetplot_subset = FALSE (= restored to default)
> options(cnetplot_subset = FALSE)
> p5 <- enrichplot::cnetplot(x2, layout = igraph::layout_as_tree, showCategory = c("Cell cycle", "Motor proteins", "Oocyte meiosis"),
+ size_category = 2, node_label=genes2plot, foldChange = geneList)
>
> print(p5)
>
Q6: note that the labels of the gene sets are again missing in p5...
More info on igraph layouts: https://igraph.org/r/html/1.2.5/layout_.html
For inspiration some examples:
The first 2 example pictures are taken from: https://biosakshat.github.io/network-analysis.html#network-layouts Last example picture taken from: https://dshizuka.github.io/networkanalysis/03_plots.html
> packageVersion("enrichplot")
[1] ‘1.27.1.4’
> packageVersion("ggtangle")
[1] ‘0.0.4.3’
> packageVersion("clusterProfiler")
[1] ‘4.14.0’
>
> sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: Europe/Amsterdam
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] org.Hs.eg.db_3.20.0 AnnotationDbi_1.68.0 IRanges_2.40.0
[4] S4Vectors_0.44.0 Biobase_2.66.0 BiocGenerics_0.52.0
[7] enrichplot_1.27.1.004 clusterProfiler_4.14.0
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 dplyr_1.1.4 farver_2.1.2
[4] blob_1.2.4 R.utils_2.12.3 Biostrings_2.74.0
[7] lazyeval_0.2.2 fastmap_1.2.0 digest_0.6.37
[10] lifecycle_1.0.4 KEGGREST_1.46.0 tidytree_0.4.6
[13] RSQLite_2.3.7 magrittr_2.0.3 compiler_4.4.2
[16] rlang_1.1.4 tools_4.4.2 igraph_2.1.1
[19] utf8_1.2.4 data.table_1.16.2 ggtangle_0.0.4.003
[22] labeling_0.4.3 bit_4.5.0 gson_0.1.0
[25] plyr_1.8.9 RColorBrewer_1.1-3 aplot_0.2.3
[28] BiocParallel_1.40.0 withr_3.0.2 purrr_1.0.2
[31] R.oo_1.27.0 grid_4.4.2 fansi_1.0.6
[34] GOSemSim_2.32.0 colorspace_2.1-1 GO.db_3.20.0
[37] ggplot2_3.5.1 scales_1.3.0 cli_3.6.3
[40] crayon_1.5.3 treeio_1.30.0 generics_0.1.3
[43] ggtree_3.14.0 httr_1.4.7 reshape2_1.4.4
[46] ape_5.8 DBI_1.2.3 qvalue_2.38.0
[49] cachem_1.1.0 DOSE_4.0.0 stringr_1.5.1
[52] zlibbioc_1.52.0 splines_4.4.2 parallel_4.4.2
[55] ggplotify_0.1.2 XVector_0.46.0 yulab.utils_0.1.7
[58] vctrs_0.6.5 Matrix_1.7-1 jsonlite_1.8.9
[61] gridGraphics_0.5-1 patchwork_1.3.0 bit64_4.5.2
[64] ggrepel_0.9.6 tidyr_1.3.1 glue_1.8.0
[67] codetools_0.2-20 cowplot_1.1.3 stringi_1.8.4
[70] gtable_0.3.6 GenomeInfoDb_1.42.0 UCSC.utils_1.2.0
[73] munsell_0.5.1 tibble_3.2.1 pillar_1.9.0
[76] fgsea_1.32.0 GenomeInfoDbData_1.2.13 R6_2.5.1
[79] lattice_0.22-6 R.methodsS3_1.8.2 png_0.1-8
[82] memoise_2.0.1 ggfun_0.1.7 Rcpp_1.0.13-1
[85] fastmatch_1.1-4 nlme_3.1-166 fs_1.6.5
[88] pkgconfig_2.0.3
>
all fixed in ggtangle v >= 0.0.4.004.
Dear Professor Yu,
Does the "node_label" function also work for emapplot? For manually selecting specific labels on nodes. Thank you!
@ChihYingLu : that is a good question, and AFAIK this is not possible (yet?).
@GuangchuangYu : could you please have a look at this?
@ChihYingLu @guidohooiveld they are not equivalent and we need to specify our needs in emapplot().