microeco plot_lefse_cladogram stuck in default use_taxa

Hi , Thank you very much for developing this good package to doing LEfSe anaylsis in R. LEfSe cladogram base on ggtree is more clear and beautiful than original version.

I use a test dataset (1735 otu and15 samples) to do LEfSe in microeco, When execute $plot_lefse_cladogram stuck a long time in default use_taxa_num = 200 ,

In this dataset, I try to decrease the number of use_taxa_num to 124, then it's work fine and speed (~0.45s) and when use_taxa_num > 124, it stuck > 1hr and no output.

use the source code microeco/R/trans_diff.R to check step by step it's stuck in this line tree <- ggtree::ggtree(tree, size = 0.2, layout = 'circular')

Code as follows:

library(microeco)
library(ggtree)

meco_qiime2 = readRDS("meco_qiime2.rds")

meco_qiime2$cal_abund()

lefse_diff = trans_diff$new(dataset = meco_qiime2, method = "lefse", group = "Group")

png("test_lefse_caldogram.png",width = 4000,height = 3200,res = 300)
lefse_diff$plot_lefse_cladogram(use_taxa_num = 124, use_feature_num = 40, clade_label_level = 4,alpha = 0.2)
dev.off()

RDS file and caldogram microeco_issue.zip

R session

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 8 (Core)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so

locale:
 [1] LC_CTYPE=zh_TW.UTF-8       LC_NUMERIC=C               LC_TIME=zh_TW.UTF-8        LC_COLLATE=zh_TW.UTF-8    
 [5] LC_MONETARY=zh_TW.UTF-8    LC_MESSAGES=zh_TW.UTF-8    LC_PAPER=zh_TW.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=zh_TW.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microeco_0.3.3

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5          BiocManager_1.30.10 pillar_1.4.7        compiler_4.0.3      RColorBrewer_1.1-2  plyr_1.8.6         
 [7] tools_4.0.3         digest_0.6.27       aplot_0.0.6         jsonlite_1.7.2      ggtree_2.4.1        tidytree_0.3.3     
[13] lifecycle_0.2.0     tibble_3.0.4        gtable_0.3.0        nlme_3.1-149        lattice_0.20-41     mgcv_1.8-33        
[19] pkgconfig_2.0.3     rlang_0.4.10        Matrix_1.2-18       rstudioapi_0.13     rvcheck_0.1.8       patchwork_1.1.1    
[25] parallel_4.0.3      treeio_1.14.3       dplyr_1.0.2         stringr_1.4.0       cluster_2.1.0       generics_0.1.0     
[31] vctrs_0.3.6         grid_4.0.3          tidyselect_1.1.0    glue_1.4.2          data.table_1.13.6   R6_2.5.0           
[37] farver_2.0.3        tidyr_1.1.2         ggplot2_3.3.3       purrr_0.3.4         reshape2_1.4.4      magrittr_2.0.1     
[43] scales_1.1.1        ellipsis_0.3.1      MASS_7.3-53         splines_4.0.3       permute_0.9-5       ape_5.4-1          
[49] colorspace_2.0-0    labeling_0.4.2      stringi_1.5.3       lazyeval_0.2.2      munsell_0.5.0       crayon_1.3.4       
[55] vegan_2.5-7

Thanks you a lot!

Mar 25 '21 06:03 chikao0817

Hi, @chikao0817

The issue comes from the wired taxonomy information-"c__c__" in the abund_table of lefse_diff: k__Bacteria|p__Candidatus_Melainabacteria|c__c__|o__Vampirovibrionales Actually, this part of code has the checking step and filtering for those taxonomy. However, this is only designed for the unified taxonomy assignment, like the taxonomy info in the example table, which has been transformed with the tidy_taxonomy() function. So if you use tidy_taxonomy() on the meco_qiime2$tax_table, the "c__c__" will be converted to "c__", thus in the lefse_diff, this line "k__Bacteria|p__Candidatus_Melainabacteria|c__|o__Vampirovibrionales" can be filtered by the checking code automatically. This issue will no exist. Now to solve this problem, you can check the abund_table in lefse_diff$abund_table or use tidy_taxonomy() on the meco_qiime2$tax_table and recalculate the lefse_diff. In my view, a uniform taxonomy information is very important in many data analysis methods. It is ok in these codes:

meco_qiime2$tax_table %<>% tidy_taxonomy meco_qiime2$tidy_dataset() meco_qiime2$cal_abund() lefse_diff <- trans_diff$new(dataset = meco_qiime2, method = "lefse", group = "Group") lefse_diff$plot_lefse_cladogram(use_taxa_num = 200, use_feature_num = 40, clade_label_level = 4,alpha = 0.2)

Chi

Mar 25 '21 13:03 ChiLiubio

Thanks for the quick reply :)

It's very useful to use tidy_taxonomy() to clean taxonomy table, and make a uniform taxonomy information. Then solve this problem.

Thank you very much.

Mar 25 '21 16:03 chikao0817

plot_lefse_cladogram stuck in default use_taxa_num = 200