enrichplot icon indicating copy to clipboard operation
enrichplot copied to clipboard

How to show labels for several selected genes

Open reliejiayou opened this issue 1 year ago • 1 comments

Hi,I tried to get my plot and I found that there were too many gene labels in the graph. I know I can solve this problem by just reducing the gene nodes, but I'm wondering if it is possible to draw all the gene nodes and to choose just some of them to show their labels? Looking forward for your reply.

reliejiayou avatar Sep 23 '24 01:09 reliejiayou

By knowing that the plots generated by enrichplot are ggplot2-objects, this is relatively straight-forward to do; you can just add/remove layers to/from the plot.

In order to fully reproduce the plots, one should obviously also know the values of the relevant arguments that are taken into account when generating the plot. You can find these on the source code.

Assuming you would like to modify a cnetplot, these are the relevant lines in the function cnetplot: https://github.com/YuLab-SMU/enrichplot/blob/591b7d1dd4c55c7f79446fe30265c78a64af0540/R/cnetplot.R#L328-L332

The unexposed function add_node_label is defined in emapplot_utilities.R: https://github.com/YuLab-SMU/enrichplot/blob/591b7d1dd4c55c7f79446fe30265c78a64af0540/R/emapplot_utilities.R#L460 Note the use of geom_node_text (link) when checking the full source code of add_node_label, and to deduce the values to be used for the arguments of gem_node_text, check the code of add_node_label as well.

Below some code to demonstrate proof-of-concept.

> ## Load required libraries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> library(ggraph)  ## for function 'geom_node_text'
> 
> ## Load example data
> data(geneList, package="DOSE")
> 
> ## Goal: in a cnetplot, plot labels for only these 3 genes
> genes2plot <- c("AURKA", "CDC20", "CENPE")
> 
> ## Which KEGG pathways are over-represented in a 
> ## kist of genes?
> de <- names(geneList)[1:100]
> x <- enrichKEGG(de)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> x <- setReadable(x, "org.Hs.eg.db", "ENTREZID")
> x2 <- pairwise_termsim(x)
> 
> ## Set number of gene sets to plot.
> ## This number (also!) corresponds to the first lines 
> ## in the data slot of the plotted object.
> ##
> ## Note: default value = 5.
> showCategory <- 5
> 
> ## Create cnetplot.
> ## note that all genes are plotted!
> p1 <- cnetplot(x2, showCategory=showCategory)
> 
> print(p1)
> 

image

> ## Check which layer contains the gene symbols
> ## Answer: layer 3.
> p1$layers
[[1]]
mapping: edge_alpha = ~I(alpha), x = ~x, y = ~y, xend = ~xend, yend = ~yend, group = ~edge.id 
geom_edge_path: arrow = NULL, lineend = butt, linejoin = round, linemitre = 1, interpolate = FALSE, label_colour = black, label_alpha = 1, label_parse = FALSE, check_overlap = FALSE, angle_calc = rot, force_flip = TRUE, label_dodge = NULL, label_push = NULL, na.rm = FALSE
stat_edge_link: n = 100, na.rm = FALSE
position_identity 

[[2]]
mapping: colour = ~I(color), size = ~size, alpha = ~I(alpha), x = ~x, y = ~y 
geom_point: na.rm = FALSE
stat_filter: na.rm = FALSE
position_identity 

[[3]]
mapping: label = ~name, x = ~x, y = ~y 
geom_text_repel: parse = FALSE, na.rm = FALSE
stat_filter: na.rm = FALSE
position_identity 

> 
> ## Remove layer 3 from p1
> p1$layers[[3]] <- NULL
> 
> ## Check; indeed no labels are plotted!
> print(p1)
>

image

> ## Replace removed layer by new one in which only selected genes are plotted.
> ## Note that the first rows in the data slot of the graph (i.e. p$data)
> ## correspond to the number of gene sets to be shown in the plot, 
> ## thus to the value of the argument 'showCategory'.
> ## This is important, because gene set nodes should always be plotted,
> ## although their names do not match with the content of 'genes2plot'.
> ## Hence, for subsetting p1$data, names of gene sets and genes2plot need to be concatenated
> ## in the line below [=   c(p1$data$name[1:showCategory],genes2plot)  ].
> 
> ## Check data slot of object
> p1$data
# A tibble: 33 × 9
         x       y name     size color .ggraph.orig_index circular .ggraph.index
     <dbl>   <dbl> <chr>   <dbl> <chr>              <int> <lgl>            <int>
 1 -0.964   1.56   Cell c…    12 #E5C…                  1 FALSE                1
 2 -1.19   -1.33   Cellul…     7 #E5C…                  2 FALSE                2
 3 -2.76   -0.0747 Oocyte…     6 #E5C…                  3 FALSE                3
 4  2.77   -2.83   Motor …     7 #E5C…                  4 FALSE                4
 5  3.74    2.19   IL-17 …     5 #E5C…                  5 FALSE                5
 6  0.0479  2.84   CDC45       3 #B3B…                  6 FALSE                6
 7 -2.72    1.46   CDC20       3 #B3B…                  7 FALSE                7
 8 -1.03   -0.0836 CCNB2       3 #B3B…                  8 FALSE                8
 9 -0.610   3.14   NDC80       3 #B3B…                  9 FALSE                9
10 -1.84    0.141  CCNA2       3 #B3B…                 10 FALSE               10
# ℹ 23 more rows
# ℹ 1 more variable: alpha <dbl>
# ℹ Use `print(n = ...)` to see more rows
> 
> ## Subset data slot, and prepare new plot.
> p2 <- p1 + geom_node_text(data=p1$data[which( p1$data$name %in% c(p1$data$name[1:showCategory],genes2plot) ),],
+                           aes(x=x, y=y, label=name), color='black', bg.color="white",
+                           segment.size=0.5, repel=TRUE, size=5)
> 
> print(p2)
> 
> ## Done!
>

image

guidohooiveld avatar Sep 24 '24 21:09 guidohooiveld

This was a great explanation and it also works nicely.
Is there also a way to modify the plot to remove all the edges and nodes of the genes I don't want to label? Or make them transparent?

yeroslaviz avatar Nov 05 '24 13:11 yeroslaviz

library(ggtangle)
set.seed(123)
x <- list(A = letters[1:10], B=letters[5:12], C=letters[sample(1:26, 15)])

cnetplot(x, node_label = letters[1:5])

we get:

image

If you want to do the subsetting.

options(cnetplot_subset = TRUE)
cnetplot(x, node_label = letters[1:5])

image

You may also interested in https://github.com/YuLab-SMU/enrichplot/issues/194 and https://github.com/YuLab-SMU/enrichplot/issues/253.

All these currently need github version of ggtangle.

GuangchuangYu avatar Nov 06 '24 09:11 GuangchuangYu

Not sure how it should work.

i installed ggtangle version 0.0.4.003, but when I run your code example I get an error -

> library(ggtangle)
> set.seed(123)
> x <- list(A = letters[1:10], B=letters[5:12], C=letters[sample(1:26, 15)])
> 
> cnetplot(x, node_label = letters[1:5])
Error in match.arg(node_label, c("category", "gene", "all", "none")) : 
  'arg' must be of length 1

How do I make it recognize the correct parameters?

EDIT:

Ok, I know I need to use the ggtangle::cneplot, but How do I use this with an enrichResult object?

yeroslaviz avatar Nov 06 '24 20:11 yeroslaviz

sorry about that. I'll continue here, as I think it has more to do with this package.

Unfortunately it doesn't work as it suppose to do.

When I try this command:

enrichplot::cnetplot(ego.BP, color.params = list(foldChange = sig.genes, edge = TRUE), node_label = names(sig.genes)[1:5],
    showCategory = go_of_interest$Name, # layout = 'circle',
    cex.params = list(gene_node = 0.5, gene_label = 0.5)
    ) 

(> names(sig.genes)[1:5] => [1] "IL13RA2" "COMP" "RFLNA" "TGM2" "CCL2" )

I get the following error:

Error in match.arg(node_label, c("category", "gene", "all", "none")) : 
  'arg' must be of length 1

In what order should I load the two packages?

yeroslaviz avatar Nov 07 '24 09:11 yeroslaviz

Your enrichplot is out-of-date which internally used ggraph instead of ggtangle.

GuangchuangYu avatar Nov 07 '24 09:11 GuangchuangYu

For any new feature, the first thing you need to do is make sure all versions are up to date.

GuangchuangYu avatar Nov 07 '24 09:11 GuangchuangYu

thanks. Now it works nicely

yeroslaviz avatar Nov 07 '24 09:11 yeroslaviz

Just chiming in with some issues and 6 (small) questions regarding fine-tuning of the new functionality...

See below for code + questions.

> ## Load required libraries
> library(clusterProfiler)
> library(enrichplot)
> library(org.Hs.eg.db)
> 
> ## Load example data
> data(geneList, package="DOSE")
> 
> ## 3 genes randomly selected
> genes2plot <- c("AURKA", "CDC20", "CENPE")
> 
> ## Which KEGG pathways are over-represented in a 
> ## list of selected genes?
> de <- names(geneList)[1:100]
> x <- enrichKEGG(de)
Reading KEGG annotation online: "https://rest.kegg.jp/link/hsa/pathway"...
Reading KEGG annotation online: "https://rest.kegg.jp/list/pathway/hsa"...
> x <- setReadable(x, "org.Hs.eg.db", "ENTREZID")
> x2 <- pairwise_termsim(x)
> 
> x2
#
# over-representation test
#
#...@organism    hsa 
#...@ontology    KEGG 
#...@keytype     ENTREZID 
#...@gene        chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
#...pvalues adjusted by 'BH' with cutoff <0.05 
#...9 enriched terms found
'data.frame':   9 obs. of  14 variables:
 $ category      : chr  "Cellular Processes" "Cellular Processes" "Cellular Processes" "Cellular Processes" ...
 $ subcategory   : chr  "Cell growth and death" "Cell growth and death" "Cell growth and death" "Cell motility" ...
 $ ID            : chr  "hsa04110" "hsa04218" "hsa04114" "hsa04814" ...
 $ Description   : chr  "Cell cycle" "Cellular senescence" "Oocyte meiosis" "Motor proteins" ...
 $ GeneRatio     : chr  "12/58" "7/58" "6/58" "7/58" ...
 $ BgRatio       : chr  "158/8768" "157/8768" "139/8768" "197/8768" ...
 $ RichFactor    : num  0.0759 0.0446 0.0432 0.0355 0.0526 ...
 $ FoldEnrichment: num  11.48 6.74 6.53 5.37 7.96 ...
 $ zScore        : num  10.85 5.92 5.36 5.06 5.56 ...
 $ pvalue        : num  3.36e-10 7.21e-05 2.93e-04 2.96e-04 3.91e-04 ...
 $ p.adjust      : num  4.07e-08 4.36e-03 8.96e-03 8.96e-03 9.46e-03 ...
 $ qvalue        : num  3.75e-08 4.02e-03 8.26e-03 8.26e-03 8.73e-03 ...
 $ geneID        : chr  "CDC45/CDC20/CCNB2/NDC80/CCNA2/CDK1/MAD2L1/CDT1/TTK/AURKB/CHEK1/TRIP13" "FOXM1/MYBL2/CCNB2/CCNA2/CDK1/CALML5/CHEK1" "CDC20/CCNB2/CDK1/MAD2L1/CALML5/AURKA" "KIF23/CENPE/KIF18A/KIF11/KIFC1/KIF18B/KIF20A" ...
 $ Count         : int  12 7 6 7 5 5 5 4 6
#...Citation
S Xu, E Hu, Y Cai, Z Xie, X Luo, L Zhan, W Tang, Q Wang, B Liu, R Wang, W Xie, T Wu, L Xie, G Yu. Using clusterProfiler to characterize multiomics data. Nature Protocols. 2024, doi:10.1038/s41596-024-01020-z 
>
> ## 1) Create default cnetplot.
> ## note that all genes are plotted!
> p1 <- cnetplot(x2)
> 
> print(p1)
>

image

> ## 2) Create cnetplot in which 3 gene sets are shown,
> ## and only the 3 genes are labelled.
> ## also color nodes based on fold change.
> p2 <- enrichplot::cnetplot(x2, showCategory = c("Cell cycle", "Motor proteins", "Oocyte meiosis"),
+       node_label=genes2plot , foldChange = geneList) 
>  
> print(p2)
> 

image

Q1: the labels of the gene sets are somehow missing in p2... Can this be fixed? Q2: the legend contains many [=7] dots (size ranging from 6-12). Could the number of dots in the legend be reduced? To (say) only 3 or 4??


> ## 3) Same as 2, but remove the edges from the genes that are not labelled.
> ## Note the use of options() to enable this.
> options(cnetplot_subset = TRUE)
> p3 <- enrichplot::cnetplot(x2, showCategory = c("Cell cycle", "Motor proteins", "Oocyte meiosis"),
+       node_label=genes2plot , foldChange = geneList) 
>  
> print(p3)
> 

image

Q3: in p3 the dots in the legend contain fractions, but that number rather should be integers only? Q4: Again, reduce the number of dots shown in the legend? From 5 to 2? Q5: Also, the numbers in the legend should rather reflect the number of genes 'initially belonging' to the gene set, and NOT to the number of genes that are finally kept in the plot? [Thus, for example, for set "Motor proteins" size should (always) be 7 instead of 1]? See content of x2 in first code chunk. Note: labels gene sets are now plotted?!

## 4) Same as 3, but use different layout. (default is: igraph::layout_nicely). 
## Also increase (multiply) size of gene sets nodes (by using size_category() ).
p4 <- enrichplot::cnetplot(x2, layout = igraph::layout_as_tree, showCategory = c("Cell cycle", "Motor proteins", "Oocyte meiosis"),
      size_category = 2, node_label=genes2plot, foldChange = geneList) 

print(p4)

image

> ## 5) Same as 2, but use different layout. (default is: igraph::layout_nicely)
> ## note that cnetplot_subset = FALSE (= restored to default)
> options(cnetplot_subset = FALSE)
> p5 <- enrichplot::cnetplot(x2, layout = igraph::layout_as_tree, showCategory = c("Cell cycle", "Motor proteins", "Oocyte meiosis"),
+       size_category = 2, node_label=genes2plot, foldChange = geneList) 
> 
> print(p5)
> 


image

Q6: note that the labels of the gene sets are again missing in p5...

More info on igraph layouts: https://igraph.org/r/html/1.2.5/layout_.html

For inspiration some examples: unnamed-chunk-288-9

unnamed-chunk-288-10

unnamed-chunk-6-1

The first 2 example pictures are taken from: https://biosakshat.github.io/network-analysis.html#network-layouts Last example picture taken from: https://dshizuka.github.io/networkanalysis/03_plots.html

> packageVersion("enrichplot")
[1] ‘1.27.1.4’
> packageVersion("ggtangle")
[1] ‘0.0.4.3’
> packageVersion("clusterProfiler")
[1] ‘4.14.0’
>

> sessionInfo()
R version 4.4.2 (2024-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19042)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Europe/Amsterdam
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] org.Hs.eg.db_3.20.0    AnnotationDbi_1.68.0   IRanges_2.40.0        
[4] S4Vectors_0.44.0       Biobase_2.66.0         BiocGenerics_0.52.0   
[7] enrichplot_1.27.1.004  clusterProfiler_4.14.0

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1        dplyr_1.1.4             farver_2.1.2           
 [4] blob_1.2.4              R.utils_2.12.3          Biostrings_2.74.0      
 [7] lazyeval_0.2.2          fastmap_1.2.0           digest_0.6.37          
[10] lifecycle_1.0.4         KEGGREST_1.46.0         tidytree_0.4.6         
[13] RSQLite_2.3.7           magrittr_2.0.3          compiler_4.4.2         
[16] rlang_1.1.4             tools_4.4.2             igraph_2.1.1           
[19] utf8_1.2.4              data.table_1.16.2       ggtangle_0.0.4.003     
[22] labeling_0.4.3          bit_4.5.0               gson_0.1.0             
[25] plyr_1.8.9              RColorBrewer_1.1-3      aplot_0.2.3            
[28] BiocParallel_1.40.0     withr_3.0.2             purrr_1.0.2            
[31] R.oo_1.27.0             grid_4.4.2              fansi_1.0.6            
[34] GOSemSim_2.32.0         colorspace_2.1-1        GO.db_3.20.0           
[37] ggplot2_3.5.1           scales_1.3.0            cli_3.6.3              
[40] crayon_1.5.3            treeio_1.30.0           generics_0.1.3         
[43] ggtree_3.14.0           httr_1.4.7              reshape2_1.4.4         
[46] ape_5.8                 DBI_1.2.3               qvalue_2.38.0          
[49] cachem_1.1.0            DOSE_4.0.0              stringr_1.5.1          
[52] zlibbioc_1.52.0         splines_4.4.2           parallel_4.4.2         
[55] ggplotify_0.1.2         XVector_0.46.0          yulab.utils_0.1.7      
[58] vctrs_0.6.5             Matrix_1.7-1            jsonlite_1.8.9         
[61] gridGraphics_0.5-1      patchwork_1.3.0         bit64_4.5.2            
[64] ggrepel_0.9.6           tidyr_1.3.1             glue_1.8.0             
[67] codetools_0.2-20        cowplot_1.1.3           stringi_1.8.4          
[70] gtable_0.3.6            GenomeInfoDb_1.42.0     UCSC.utils_1.2.0       
[73] munsell_0.5.1           tibble_3.2.1            pillar_1.9.0           
[76] fgsea_1.32.0            GenomeInfoDbData_1.2.13 R6_2.5.1               
[79] lattice_0.22-6          R.methodsS3_1.8.2       png_0.1-8              
[82] memoise_2.0.1           ggfun_0.1.7             Rcpp_1.0.13-1          
[85] fastmatch_1.1-4         nlme_3.1-166            fs_1.6.5               
[88] pkgconfig_2.0.3        
> 

guidohooiveld avatar Nov 07 '24 15:11 guidohooiveld

all fixed in ggtangle v >= 0.0.4.004.

GuangchuangYu avatar Nov 15 '24 13:11 GuangchuangYu

Dear Professor Yu,

Does the "node_label" function also work for emapplot? For manually selecting specific labels on nodes. Thank you!

ChihYingLu avatar Nov 25 '24 06:11 ChihYingLu

@ChihYingLu : that is a good question, and AFAIK this is not possible (yet?).

@GuangchuangYu : could you please have a look at this?

guidohooiveld avatar Dec 05 '24 10:12 guidohooiveld

@ChihYingLu @guidohooiveld they are not equivalent and we need to specify our needs in emapplot().

GuangchuangYu avatar Dec 12 '24 07:12 GuangchuangYu