MicrobiotaProcess Biomarker discovery explanation and new package's feature discussion

Biomarker discovery explanation and new package's feature discussion

Open alienzj opened this issue 2 years ago • 36 comments

Dear MPSE developer,

Recently I try to use MicrobiotaProcess to do microbiome data analysis. This is a very nice R package as far as I know.

So, I installed it and learn it based on the documentation.

I encountered two problem in reproducing the biomarker analysis.

and:

Any help would be greatly appreciated.

Oct 12 '21 04:10 alienzj

─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.1 (2021-08-10)
 os       Arch Linux                  
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Asia/Hong_Kong              
 date     2021-10-12                  

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version  date       lib source        
 ade4                   1.7-18   2021-09-16 [1] CRAN (R 4.1.1)
 ape                    5.5      2021-04-25 [1] CRAN (R 4.1.1)
 aplot                  0.1.1    2021-09-22 [1] CRAN (R 4.1.1)
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.1.1)
 Biobase                2.52.0   2021-05-19 [1] Bioconductor  
 BiocGenerics           0.38.0   2021-05-19 [1] Bioconductor  
 BiocManager            1.30.16  2021-06-15 [1] CRAN (R 4.1.1)
 biomformat             1.20.0   2021-05-19 [1] Bioconductor  
 Biostrings             2.60.2   2021-08-05 [1] Bioconductor  
 bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.1.1)
 bslib                  0.3.1    2021-10-06 [1] CRAN (R 4.1.1)
 cli                    3.0.1    2021-07-17 [1] CRAN (R 4.1.1)
 cluster                2.1.2    2021-04-17 [2] CRAN (R 4.1.1)
 codetools              0.2-18   2020-11-04 [2] CRAN (R 4.1.1)
 coin                   1.4-2    2021-10-08 [1] CRAN (R 4.1.1)
 colorspace             2.0-2    2021-06-24 [1] CRAN (R 4.1.1)
 corrr                  0.4.3    2020-11-24 [1] CRAN (R 4.1.1)
 crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.1.1)
 data.table             1.14.2   2021-09-27 [1] CRAN (R 4.1.1)
 DBI                    1.1.1    2021-01-15 [1] CRAN (R 4.1.1)
 DelayedArray           0.18.0   2021-05-19 [1] Bioconductor  
 digest                 0.6.28   2021-09-23 [1] CRAN (R 4.1.1)
 dplyr                  1.0.7    2021-06-18 [1] CRAN (R 4.1.1)
 ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.1.1)
 evaluate               0.14     2019-05-28 [1] CRAN (R 4.1.1)
 fansi                  0.5.0    2021-05-25 [1] CRAN (R 4.1.1)
 farver                 2.1.0    2021-02-28 [1] CRAN (R 4.1.1)
 fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.1.1)
 forcats              * 0.5.1    2021-01-27 [1] CRAN (R 4.1.1)
 foreach                1.5.1    2020-10-15 [1] CRAN (R 4.1.1)
 generics               0.1.0    2020-10-31 [1] CRAN (R 4.1.1)
 GenomeInfoDb           1.28.4   2021-09-05 [1] Bioconductor  
 GenomeInfoDbData       1.2.6    2021-08-12 [1] Bioconductor  
 GenomicRanges          1.44.0   2021-05-19 [1] Bioconductor  
 ggalluvial             0.12.3   2020-12-05 [1] CRAN (R 4.1.1)
 ggfun                  0.0.4    2021-09-17 [1] CRAN (R 4.1.1)
 gghalves               0.1.1    2020-11-08 [1] CRAN (R 4.1.1)
 ggnewscale             0.4.5    2021-01-11 [1] CRAN (R 4.1.1)
 ggplot2              * 3.3.5    2021-06-25 [1] CRAN (R 4.1.1)
 ggplotify              0.1.0    2021-09-02 [1] CRAN (R 4.1.1)
 ggrepel                0.9.1    2021-01-15 [1] CRAN (R 4.1.1)
 ggside                 0.1.2    2021-07-21 [1] CRAN (R 4.1.1)
 ggsignif               0.6.3    2021-09-09 [1] CRAN (R 4.1.1)
 ggstar               * 1.0.2    2021-04-07 [1] CRAN (R 4.1.1)
 ggtree               * 3.0.4    2021-08-22 [1] Bioconductor  
 ggtreeExtra          * 1.2.3    2021-09-12 [1] Bioconductor  
 glue                   1.4.2    2020-08-27 [1] CRAN (R 4.1.1)
 gridExtra              2.3      2017-09-09 [1] CRAN (R 4.1.1)
 gridGraphics           0.5-1    2020-12-13 [1] CRAN (R 4.1.1)
 gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.1.1)
 htmltools              0.5.2    2021-08-25 [1] CRAN (R 4.1.1)
 igraph                 1.2.6    2020-10-06 [1] CRAN (R 4.1.1)
 IRanges                2.26.0   2021-05-19 [1] Bioconductor  
 iterators              1.0.13   2020-10-15 [1] CRAN (R 4.1.1)
 jquerylib              0.1.4    2021-04-26 [1] CRAN (R 4.1.1)
 jsonlite               1.7.2    2020-12-09 [1] CRAN (R 4.1.1)
 knitr                  1.36     2021-09-29 [1] CRAN (R 4.1.1)
 labeling               0.4.2    2020-10-20 [1] CRAN (R 4.1.1)
 lattice                0.20-44  2021-05-02 [2] CRAN (R 4.1.1)
 lazyeval               0.2.2    2019-03-15 [1] CRAN (R 4.1.1)
 libcoin                1.0-9    2021-09-27 [1] CRAN (R 4.1.1)
 lifecycle              1.0.1    2021-09-24 [1] CRAN (R 4.1.1)
 magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.1.1)
 MASS                   7.3-54   2021-05-03 [2] CRAN (R 4.1.1)
 Matrix                 1.3-4    2021-06-01 [2] CRAN (R 4.1.1)
 MatrixGenerics         1.4.3    2021-08-26 [1] Bioconductor  
 matrixStats            0.61.0   2021-09-17 [1] CRAN (R 4.1.1)
 mgcv                   1.8-36   2021-06-01 [2] CRAN (R 4.1.1)
 MicrobiotaProcess    * 1.4.4    2021-10-03 [1] Bioconductor  
 modeltools             0.2-23   2020-03-05 [1] CRAN (R 4.1.1)
 multcomp               1.4-17   2021-04-29 [1] CRAN (R 4.1.1)
 multtest               2.48.0   2021-05-19 [1] Bioconductor  
 munsell                0.5.0    2018-06-12 [1] CRAN (R 4.1.1)
 mvtnorm                1.1-3    2021-10-08 [1] CRAN (R 4.1.1)
 nlme                   3.1-152  2021-02-04 [2] CRAN (R 4.1.1)
 patchwork              1.1.1    2020-12-17 [1] CRAN (R 4.1.1)
 permute                0.9-5    2019-03-12 [1] CRAN (R 4.1.1)
 phyloseq               1.36.0   2021-05-19 [1] Bioconductor  
 pillar                 1.6.3    2021-09-26 [1] CRAN (R 4.1.1)
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.1.1)
 plyr                   1.8.6    2020-03-03 [1] CRAN (R 4.1.1)
 purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.1.1)
 R6                     2.5.1    2021-08-19 [1] CRAN (R 4.1.1)
 Rcpp                   1.0.7    2021-07-07 [1] CRAN (R 4.1.1)
 RCurl                  1.98-1.5 2021-09-17 [1] CRAN (R 4.1.1)
 reshape2               1.4.4    2020-04-09 [1] CRAN (R 4.1.1)
 rhdf5                  2.36.0   2021-05-19 [1] Bioconductor  
 rhdf5filters           1.4.0    2021-05-19 [1] Bioconductor  
 Rhdf5lib               1.14.2   2021-07-06 [1] Bioconductor  
 rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.1.1)
 rmarkdown              2.11     2021-09-14 [1] CRAN (R 4.1.1)
 rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.1.1)
 S4Vectors              0.30.2   2021-10-03 [1] Bioconductor  
 sandwich               3.0-1    2021-05-18 [1] CRAN (R 4.1.1)
 sass                   0.4.0    2021-05-12 [1] CRAN (R 4.1.1)
 scales                 1.1.1    2020-05-11 [1] CRAN (R 4.1.1)
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.1.1)
 stringi                1.7.5    2021-10-04 [1] CRAN (R 4.1.1)
 stringr                1.4.0    2019-02-10 [1] CRAN (R 4.1.1)
 SummarizedExperiment   1.22.0   2021-05-19 [1] Bioconductor  
 survival               3.2-11   2021-04-26 [2] CRAN (R 4.1.1)
 TH.data                1.1-0    2021-09-27 [1] CRAN (R 4.1.1)
 tibble                 3.1.5    2021-09-30 [1] CRAN (R 4.1.1)
 tidyr                  1.1.4    2021-09-27 [1] CRAN (R 4.1.1)
 tidyselect             1.1.1    2021-04-30 [1] CRAN (R 4.1.1)
 tidytree             * 0.3.5    2021-09-08 [1] CRAN (R 4.1.1)
 treeio                 1.16.2   2021-08-17 [1] Bioconductor  
 utf8                   1.2.2    2021-07-24 [1] CRAN (R 4.1.1)
 vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.1.1)
 vegan                  2.5-7    2020-11-28 [1] CRAN (R 4.1.1)
 viridisLite            0.4.0    2021-04-13 [1] CRAN (R 4.1.1)
 withr                  2.4.2    2021-04-18 [1] CRAN (R 4.1.1)
 xfun                   0.26     2021-09-14 [1] CRAN (R 4.1.1)
 XVector                0.32.0   2021-05-19 [1] Bioconductor  
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.1.1)
 yulab.utils            0.0.4    2021-10-09 [1] CRAN (R 4.1.1)
 zlibbioc               1.38.0   2021-05-19 [1] Bioconductor  
 zoo                    1.8-9    2021-03-09 [1] CRAN (R 4.1.1)

[1] /home/zhujie/.R
[2] /usr/lib/R/library

Oct 12 '21 04:10 alienzj

This is because the geom_hilight of ggtree( released version) does not support the function for data parameter. This feature is supported by ggtree >= 3.1.3. But, you can use subset=nodeClass == Phylum in aes to plot it in here.

tp2 <- tp1 +
      geom_hilight(
        #data = td_filter(nodeClass == "Phylum"), # This is only supported by ggtree >=3.1.3.
        mapping = aes(subset = nodeClass == "Phylum", 
                      node = node, 
                      fill = label)
      )

The document you refer to is the development version (Bioconductor 3.14), you can refer to the document since you are using the released version (Bioconductor 3.13)

Oct 12 '21 05:10 xiangpin

@xiangpin Thank you for your quick reply. Nice!

But I meet the second problem.

Error: Can't subset columns that don't exist. x Column `RareAbundanceBySample` doesn't exist. Run `rlang::last_error()` to see where the error occurred.

At the same time, how can I use the usage in the document on the metaphlan3 profile?

Many thanks to your help !

Oct 12 '21 05:10 alienzj

Please check column names of associated data of the treedata have the RareAbundanceBySample.

> mouse.time.mpse %>% 
    mp_rrarefy() %>% 
    mp_cal_abundance %>% 
    mp_diff_analysis(
             .abundance = RelRareAbundanceBySample, 
             .group = time, 
             first.test.alpha = 0.01) %>% 
    mp_extract_tree("taxatree")
The otutree is empty in the MPSE object!
The RareAbundance was in the MPSE object, please check whether it has been rarefied !
The otutree is empty in the MPSE object!
'treedata' S4 object'.

...@ phylo:

Phylogenetic tree with 232 tips and 218 internal nodes.

Tip labels:
  OTU_58, OTU_67, OTU_231, OTU_188, OTU_150, OTU_207, ...
Node labels:
  r__root, k__Bacteria, p__Actinobacteria, p__Bacteroidetes, p__Cyanobacteria, p__Deinococcus-Thermus, ...

Rooted; no branch lengths.

with the following features available:
        'nodeClass',    'nodeDepth',    'RareAbundanceBySample',        'LDAupper',     'LDAmean',
        'LDAlower',     'Sign_time',    'pvalue',       'fdr'.

# The associated data tibble abstraction: 450 × 12
# The 'node', 'label' and 'isTip' are from the phylo tree.
    node label   isTip nodeClass nodeDepth RareAbundanceBySamp… LDAupper LDAmean
   <dbl> <chr>   <lgl> <chr>         <dbl> <list>                  <dbl>   <dbl>
 1     1 OTU_58  TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 2     2 OTU_67  TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 3     3 OTU_231 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 4     4 OTU_188 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 5     5 OTU_150 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 6     6 OTU_207 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 7     7 OTU_5   TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 8     8 OTU_80  TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 9     9 OTU_163 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
10    10 OTU_1   TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
# … with 440 more rows, and 4 more variables: LDAlower <dbl>, Sign_time <chr>,
#   pvalue <dbl>, fdr <dbl>

I think the abundance of your metaphlan3 profile is the relative abundance not count. The mp_rrarefy requires the abundance must be count (integer) (and mp_cal_alpha also requires count (default)). To solve the problem, you can multiple the sample depth, then round to keep the integer part (this is also the solution of curatedMetagenomicData). Or you can use the relative abundance to perform the following analysis directly with force=TRUE in some functions (?mp_cal_alpha to see the help information of the function). This means the specified .abundance will be used to calculate directly not be rarefied (default). You can refer to the following examples

> library(MicrobiotaProcess)
> file1 <- system.file("extdata/MetaPhlAn", "metaphlan_test.txt", package="MicrobiotaProcess")
> sample.file <- system.file("extdata/MetaPhlAn", "sample_test.txt", package="MicrobiotaProcess")
> mpse1 <- mp_import_metaphlan(profile=file1, mapfilename=sample.file)
Warning message:
The number of features in otu table is not equal the number of features in taxonomy annotation.
* The same features will be extract automatically !
> mpse1
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 11
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
   <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
 2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
# … with 5,250 more rows
> mpse1 %>% mp_cal_alpha(.abundance=Abundance, force=TRUE)
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 15
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU      Sample Abundance group Observe Shannon Simpson     J taxid   Kingdom
   <chr>    <chr>      <dbl> <chr>   <int>   <dbl>   <dbl> <dbl> <chr>   <chr>
 1 s__Meth… GupDM…     0.596 testA      86    3.47   0.951 0.778 2157|2… k__Arc…
 2 s__Acti… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 3 s__Acti… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 4 s__Acti… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 5 s__Bifi… GupDM…     0.948 testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 6 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 7 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 8 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 9 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
10 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
# … with 5,250 more rows, and 5 more variables: Phylum <chr>, Class <chr>, Order <chr>,
#   Family <chr>, Genus <chr>
> mpse1 %>% mp_cal_alpha(.abundance=Abundance, force=TRUE) %>% mp_plot_alpha

alpha

> mpse1 %>% mp_cal_dist(.abundance=Abundance)
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 12
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group bray   taxid Kingdom Phylum Class Order Family
   <chr>  <chr>       <dbl> <chr> <list> <chr> <chr>   <chr>  <chr> <chr> <chr>
 1 s__Me… GupDM_…     0.596 testA <tibb… 2157… k__Arc… p__Eu… c__M… o__M… f__Me…
 2 s__Ac… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac…
 3 s__Ac… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac…
 4 s__Ac… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac…
 5 s__Bi… GupDM_…     0.948 testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
 6 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
 7 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
 8 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
 9 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
10 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
# … with 5,250 more rows, and 1 more variable: Genus <chr>
>mpse1 %>% mp_cal_dist(.abundance=Abundance) %>% mp_plot_dist(.distmethod=bray)

dist

>mpse1 %>% mp_cal_pcoa(.abundance=Abundance, distmethod="bray") %>% mp_plot_ord(.ord=pcoa, show.sample=T)

pcoa

Oct 12 '21 06:10 xiangpin

@xiangpin Many many thanks for your help. Now, I begin to enjoy the microbiome data exploration with MicrobiotaProcess. Thanks !

Oct 12 '21 06:10 alienzj

@xiangpin Hi, I need your help~

Oct 12 '21 11:10 alienzj

The mpse3_d0_bnt is a MPSE object constructed with MetaPhlAn3 profile. I have no idea about .abundance=?. Could you please help me to solve this problem?

Oct 12 '21 11:10 alienzj

The MPSE object created from MetaPhlAn3 profile only contained the relative abundance of species (the count is also accepted if the relative abundance multiply by the depth of samples). Then the relative abundance of other taxonomy levels can be calculated using mp_cal_abundance with force=TRUE. then if you want to perform the different analyses, you can use mp_diff_analysis (it will check whether the abundance of all taxonomy has been calculated, if it is calculated then it will be used directly, otherwise, it will be calculated in the function internal using mp_cal_abundance) with the specified abundance column name of MPSE. You can use the following examples to run it.

>library(MicrobiotaProcess)
>file1 <- system.file("extdata/MetaPhlAn", "metaphlan_test.txt", package="MicrobiotaProcess")
>sample.file <- system.file("extdata/MetaPhlAn", "sample_test.txt", package="MicrobiotaProcess")
>mpse1 <- mp_import_metaphlan(profile=file1, mapfilename=sample.file)
>mpse1
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 11
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
   <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
 2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
# … with 5,250 more rows
Warning message:
The number of features in otu table is not equal the number of features in taxonomy annotation.
* The same features will be extract automatically !
>mpse1 %>% mp_cal_abundance(.abundance=Abundance, force=TRUE) %>% mp_diff_analysis(.abundance=RelAbundanceBySample, .group=group, force=T, relative=F)

Or you can perform mp_diff_analysis directly.

mpse1 %>% mp_diff_analysis(.abundance=Abundance, force=TRUE, .group=group)

Oct 12 '21 12:10 xiangpin

@xiangpin Hi, Shuangbin, thanks for your quick reply. After your meticulous and patient explanation, I began to gradually understand the processing logic of MPSE. I try to reproduce it , but the result does not seem to be right.

Oct 12 '21 12:10 alienzj

I performed the LEfSe analysis using LEfSe python package, it can identified some biomarker. But when I use the mp_diff_analysis, cann't identify any biomarker, like above example.

I did not correct the relative abundance with sampling depth as you suggested.

Oct 12 '21 12:10 alienzj

This is because the default of LEfSe does not perform the fdr correction for the first test. But the default of mp_diff_analysis set the method of filter pvalue of the first test to fdr (p.adjust and filter.p parameter). In addition, the method to check which group is more abundant in LEfSe is median, but the method of mp_diff_analysis is generalizedFC (fc.method parameter)

> library(MicrobiotaProcess)
> file1 <- system.file("extdata/MetaPhlAn", "metaphlan_test.txt", package="MicrobiotaProcess")
> sample.file <- system.file("extdata/MetaPhlAn", "sample_test.txt", package="MicrobiotaProcess")
> mpse1 <- mp_import_metaphlan(profile=file1, mapfilename=sample.file)
> mpse1
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 11
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
   <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
 2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
# … with 5,250 more rows
> mpse1 %>% mp_cal_abundance(.abundance=Abundance, force=TRUE, relative=FALSE) %>% mp_diff_analysis(.abundance=Abundance, force=TRUE, relative=FALSE, .group=group, filter.p="pvalue") 
The otutree is empty in the MPSE object!
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 17
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
   <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
 2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
# … with 5,250 more rows, and 6 more variables: LDAupper <dbl>, LDAmean <dbl>, LDAlower <dbl>,
#   Sign_group <chr>, pvalue <dbl>, fdr <dbl>

Oct 12 '21 14:10 xiangpin

@xiangpin Hi, Shuangbin, thanks for your reply.

The above example can be reproduced.

Can I use median as fc.method ?

Oct 12 '21 15:10 alienzj

I also have a another question, If I do twice mp_diff_analysis, and got two tree visualization, can be merge two tree into one tree ? If we have many samples, can we just show the mean relative abundance of each group in tree visualization ?

Oct 12 '21 15:10 alienzj

Can I use median as fc.method ? Yes, you can specify fc.method to compare_median to use median, or compare_mean to use mean.
If we have many samples, can we just show the mean relative abundance of each group in tree visualization ? Yes, you can run mp_cal_abundance with specified .group to calculate the relative abundance or (abundance) for each group. You can refer to the following example.

> mouse.time.mpse %>% mp_cal_abundance(.abundance=Abundance) %>% mp_cal_abundance(.abundance=Abundance, .group=time) %>% mp_diff_analysis(.abundance=RelRareAbundanceBySample, .group=time) %>% mp_extract_tree() -> trda
The otutree is empty in the MPSE object!
The RareAbundance was in the MPSE object, please check whether it has been rarefied !
The otutree is empty in the MPSE object!
The RareAbundance was in the MPSE object, please check whether it has been rarefied !
The otutree is empty in the MPSE object!

The first mp_cal_abundance is to calculate the relative abundance (default relative=TRUE, and rarefied will be done in the internal default force=TRUE) of all taxonomy for each sample, The second mp_cal_abundance is to calculate the relative abundance (the same as before) of all taxonomy for each group. And the results will be added to the associated data of treedata with the nested format.

You can print the treedata to view its format and structure. And you can use ggtree and ggtreeExtra to visualize it.

> trda
'treedata' S4 object'.

...@ phylo:

Phylogenetic tree with 232 tips and 218 internal nodes.

Tip labels:
  OTU_58, OTU_67, OTU_231, OTU_188, OTU_150, OTU_207, ...
Node labels:
  r__root, k__Bacteria, p__Actinobacteria, p__Bacteroidetes, p__Cyanobacteria, p__Deinococcus-Thermus, ...

Rooted; no branch lengths.

with the following features available:
        'nodeClass',    'nodeDepth',    'RareAbundanceBytime',  'RareAbundanceBySample',
        'LDAupper',     'LDAmean',      'LDAlower',     'Sign_time',    'pvalue',       'fdr'.

# The associated data tibble abstraction: 450 × 13
# The 'node', 'label' and 'isTip' are from the phylo tree.
    node label   isTip nodeClass nodeDepth RareAbundanceBytime RareAbundanceByS…
   <dbl> <chr>   <lgl> <chr>         <dbl> <list>              <list>
 1     1 OTU_58  TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 2     2 OTU_67  TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 3     3 OTU_231 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 4     4 OTU_188 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 5     5 OTU_150 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 6     6 OTU_207 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 7     7 OTU_5   TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 8     8 OTU_80  TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 9     9 OTU_163 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
10    10 OTU_1   TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
# … with 440 more rows, and 6 more variables: LDAupper <dbl>, LDAmean <dbl>,
#   LDAlower <dbl>, Sign_time <chr>, pvalue <dbl>, fdr <dbl>
> trda %>% tidyr::unnest(RareAbundanceBytime)
# A tbl_df is returned for independent data analysis.
# A tibble: 898 × 15
    node label   isTip nodeClass nodeDepth time  RareAbundanceBy… RelRareAbundanc…
   <dbl> <chr>   <lgl> <chr>         <dbl> <chr>            <int>            <dbl>
 1     1 OTU_58  TRUE  OTU               8 Early                0          0
 2     1 OTU_58  TRUE  OTU               8 Late                 0          0
 3     2 OTU_67  TRUE  OTU               8 Early               18          0.0794
 4     2 OTU_67  TRUE  OTU               8 Late               113          0.449
 5     3 OTU_231 TRUE  OTU               8 Early                0          0
 6     3 OTU_231 TRUE  OTU               8 Late                 1          0.00397
 7     4 OTU_188 TRUE  OTU               8 Early                3          0.0132
 8     4 OTU_188 TRUE  OTU               8 Late                 7          0.0278
 9     5 OTU_150 TRUE  OTU               8 Early                8          0.0353
10     5 OTU_150 TRUE  OTU               8 Late                 5          0.0199
# … with 888 more rows, and 7 more variables: RareAbundanceBySample <list>,
#   LDAupper <dbl>, LDAmean <dbl>, LDAlower <dbl>, Sign_time <chr>,
#   pvalue <dbl>, fdr <dbl>
>

Just make a small modification to the example of vignettes

p1 <- trda %>%
      ggtree(
        layout="radial",
        size = 0.3
      ) +
      geom_point(
        data = td_filter(!isTip),
        fill="white",
        size=1,
        shape=21
      )
# display the high light of phylum clade.
p2 <- p1 +
      geom_hilight(
        #data = td_filter(nodeClass == "Phylum"), # This is supported by ggtree >=3.1.3
        mapping = aes(subset = nodeClass == "Phylum",
                      node = node,
                      fill = label)
      )
# display the relative abundance of features(OTU)
p3 <- p2 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         data = td_unnest(RareAbundanceBytime),
         geom = geom_star,
         mapping = aes(
                       x = time,
                       size = RelRareAbundanceBytime,
                       fill = time,
                       subset = RelRareAbundanceBytime > 0
                   ),
         starshape = 13,
         starstroke = 0.25,
         offset = 0.04,
         pwidth = 0.1,
         grid.params = list(linetype=2)
      ) +
      scale_size_continuous(
         name="Relative Abundance (%)",
         range = c(1, 3)
      ) +
      scale_fill_manual(values=c("#1B9E77", "#D95F02"))
# display the tip labels of taxa tree
p4 <- p3 + geom_tiplab(size=2, offset=2)
# display the LDA of significant OTU.
p5 <- p4 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         geom = geom_col,
         mapping = aes(
                       x = LDAmean,
                       fill = Sign_time,
                       subset = !is.na(LDAmean)
                       ),
         orientation = "y",
         offset = 0.3,
         pwidth = 0.5,
         axis.params = list(axis = "x",
                            title = "Log10(LDA)",
                            title.height = 0.01,
                            title.size = 2,
                            text.size = 1.8,
                            vjust = 1),
         grid.params = list(linetype = 2)
      )

# display the significant (FDR) taxonomy after kruskal.test (default)
p6 <- p5 +
      ggnewscale::new_scale("size") +
      geom_point(
         data=td_filter(!is.na(fdr)),
         mapping = aes(size = -log10(fdr),
                       fill = Sign_time,
                       ),
         shape = 21,
      ) +
      scale_size_continuous(range=c(1, 3)) +
      scale_fill_manual(values=c("#1B9E77", "#D95F02"))

p6 <- p6 + theme(
           legend.key.height = unit(0.3, "cm"),
           legend.key.width = unit(0.3, "cm"),
           legend.spacing.y = unit(0.02, "cm"),
           legend.text = element_text(size = 7),
           legend.title = element_text(size = 9),
          )
p6

PS: The display of results can be diverse, this example is just one of them, more types can be displayed using ggtree and ggtreeExtra easily since they inherit the ggplot2 grammar.

Oct 13 '21 02:10 xiangpin

@xiangpin Hi, Shuangbin, thanks for your reply. I was really amazed by the streaming operation in MPSE, you help me a lot, thanks so much!

I used fc.method="compare_median" and found that the LDA score calculated by mp_diff_analysis tends to be larger. I don't know much about the calculation details inside LEfSe, maybe I have not set the same parameters.

Oct 13 '21 04:10 alienzj

The LDA score was calculated using multiple iteration subset datasets (extract from the original dataset (default relative abundance table of all taxonomy)), which is random. To make the analysis reproducible, An random seed (seed parameter) was introduced through withr package in the internal of mp_diff_analysis. However, The method of LEfSe might have a problem, it used the random seed in the function, which might cause the subset dataset will be the same for each subset. This means it is not cross-validation-like.

Oct 13 '21 05:10 xiangpin

@xiangpin Hi, Shuangbin, thanks for your answer.

The method of LEfSe might have a problem: You means MPSE or LEfSe python package ?

Oct 13 '21 05:10 alienzj

LEfSe python package

Oct 13 '21 06:10 xiangpin

OK, I choose to use MPSE, and also recommand it to my friends and collegues.

Thanks so much!

Oct 13 '21 06:10 alienzj

Thanks, that is great.

Oct 13 '21 07:10 xiangpin

Running:

tree_ogd_ce_t2$data %>%
tidyr::unnest(AbundanceBySample) %>%
select(label, nodeClass, Sample, group, RelAbundanceBySample)

Got:

# A tibble: 1,155 × 5
   label                          nodeClass Sample        group RelAbundanceByS…
   <chr>                          <chr>     <chr>         <chr>            <dbl>
 1 s__Microbacterium_ginsengisoli OTU       CLN-control-A Cont…             0   
 2 s__Microbacterium_ginsengisoli OTU       CLN-control-B Cont…            89.0 
 3 s__Microbacterium_ginsengisoli OTU       CLN-control-C Cont…            92.5 
 4 s__Microbacterium_ginsengisoli OTU       CLN-corner-A  Corn…            75.0 
 5 s__Microbacterium_ginsengisoli OTU       CLN-corner-B  Corn…            77.7 
 6 s__Microbacterium_ginsengisoli OTU       CLN-corner-C  Corn…             9.68
 7 s__Microbacterium_ginsengisoli OTU       CLN-endo-A    Endo             92.9 
 8 s__Microbacterium_ginsengisoli OTU       CLN-endo-B    Endo             77.6 
 9 s__Microbacterium_ginsengisoli OTU       CLN-endo-C    Endo             76.6 
10 s__Microbacterium_ginsengisoli OTU       Empty-A       Empty            20.3 
# … with 1,145 more rows

Running:

tree_ogd_ce_t3 <- tree_ogd_ce_t2 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         data = td_unnest(AbundanceBySample),
         geom = geom_star,
         mapping = aes(x = fct_reorder(Sample, group, .fun=min),
                       size = RelAbundanceBySample,
                       fill = group,
                       subset = RelAbundanceBySample > 0
                   ),
         starshape = 13,
         starstroke = 0.25,
         offset = 0.02,
         pwidth = 0.05,
         grid.params = list(linetype=2)
      ) + scale_size_continuous(
         name="Relative Abundance (%)",
         range = c(1, 3)
      ) + scale_fill_manual(values=group_color)

print(tree_ogd_ce_t3)

Got:

Error in eval(parse(text = quo_name(object$mapping$subset))) : 
  object 'RelAbundanceBySample' not found

Any suggestion ? Thanks ~

Nov 25 '21 12:11 alienzj

Would you mind giving me the file? Or you can try the mp_plot_diff_res which is a new feature. you can update MicrobiotaProcess with remotes::install_github("YuLab-SMU/MicrobiotaProcess").

Nov 26 '21 03:11 xiangpin

Thanks so much ~ I will try mp_plot_diff_res.

tree_diff_mpse_ogd_control_empty.zip You can use this file to debug.

Nov 26 '21 03:11 alienzj

I can plot the object with the following codes.

library(ggtreeExtra)                                                                                                                                                                                               library(ggtree)
library(MicrobiotaProcess)
library(ggstar)
library(forcats)
library(ggplot2)

trda <- readRDS("./tree_diff_mpse_ogd_control_empty.rds")

trda

p <- ggtree(trda) +
     geom_hilight(
         mapping = aes(
           subset = nodeClass == "Phylum",
           node = node,
           fill = label
         )
     )
p2 <- p +
      ggnewscale::new_scale_fill() +
      geom_fruit(
         data = td_unnest(AbundanceBySample, names_repair=tidyr::tidyr_legacy),
         geom = geom_star,
         mapping = aes(
            x = fct_reorder(Sample, group, .fun=min),
            size = RelAbundanceBySample,
            fill = group,
            subset = RelAbundanceBySample > 0
         ),
         starshape = 13,
         pwidth = 2,
         grid.params = list(linetype=2)
      )
p3 <- p2 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         geom = geom_col,
         mapping = aes(
                       x = LDAmean,
                       fill = Sign_group,
                       subset = !is.na(LDAmean)
                       ),
         orientation = "y",
         offset = 0.1,
         pwidth = 1,
         axis.params = list(axis = "x",
                            title = "Log10(LDA)",
                            title.height = 0.1,
                            title.size = 2,
                            text.size = 1.8,
                            vjust = 1),
         grid.params = list(linetype = 1)
      ) +
      geom_tiplab(
         size = 6,
         as_ylab = TRUE
      ) +
      theme(
           legend.key.height = unit(0.3, "cm"),
           legend.key.width = unit(0.3, "cm"),
           legend.spacing.y = unit(0.02, "cm"),
           legend.text = element_text(size = 7),
           legend.title = element_text(size = 9),
      ) +
      scale_x_continuous(expand=c(0, 0.05))
p3

xx2

PS : I think it is better to run td_unnest with names_repair=tidyr::tidyr_legacy. this is because when there are the same column names between the tibble before unnest and the tibble after unnest. Please see the illustration below.

> library(tidyr)
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> dat <- data.frame(A=c(1, 2, 3), B=c(4, 5, 6))
> dat %>% nest(a=c(A, B))
# A tibble: 1 × 1
  a
  <list>
1 <tibble [3 × 2]>
> dat %>% nest(a=c(A, B)) %>% mutate(A=1, B=2)
# A tibble: 1 × 3
  a                    A     B
  <list>           <dbl> <dbl>
1 <tibble [3 × 2]>     1     2
> dat %>% nest(a=c(A, B)) %>% mutate(A=1, B=2) %>% unnest(a)
Error: Names must be unique.
✖ These names are duplicated:
  * "A" at locations 1 and 3.
  * "B" at locations 2 and 4.
ℹ Use argument `names_repair` to specify repair strategy.
Run `rlang::last_error()` to see where the error occurred.
> dat %>% nest(a=c(A, B)) %>% mutate(A=1, B=2) %>% unnest(a, names_repair=tidyr_legacy)
# A tibble: 3 × 4
      A     B    A1    B1
  <dbl> <dbl> <dbl> <dbl>
1     1     4     1     2
2     2     5     1     2
3     3     6     1     2

Nov 26 '21 04:11 xiangpin

That's great ! Thanks so much !

Nov 26 '21 04:11 alienzj

@xiangpin Dear xiangpin, could you please show the R session info to me when run the above code ?

I found I can't reproduce the results of above code, there is still the same error: Error in eval(parse(text = quo_name(object$mapping$subset))) : object 'RelAbundanceBySample' not found.

Below is my R session info:

R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /home/zhujie/.conda/envs/bioenv3.8/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_HK.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_HK.UTF-8        LC_COLLATE=en_HK.UTF-8    
 [5] LC_MONETARY=en_HK.UTF-8    LC_MESSAGES=en_HK.UTF-8   
 [7] LC_PAPER=en_HK.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_HK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggtreeExtra_1.0.2       ggstar_1.0.2            ggplotify_0.1.0        
 [4] ggtree_3.3.0.900        ggpubr_0.4.0            forcats_0.5.1          
 [7] stringr_1.4.0           dplyr_1.0.7             purrr_0.3.4            
[10] readr_2.0.2             tidyr_1.1.4             tibble_3.1.5           
[13] ggplot2_3.3.5           tidyverse_1.3.1         MicrobiotaProcess_1.7.2

loaded via a namespace (and not attached):
  [1] ggnewscale_0.4.5            TH.data_1.1-0              
  [3] colorspace_2.0-2            ggsignif_0.6.3             
  [5] ellipsis_0.3.2              modeltools_0.2-23          
  [7] rio_0.5.27                  XVector_0.30.0             
  [9] GenomicRanges_1.42.0        fs_1.5.0                   
 [11] aplot_0.1.1                 rstudioapi_0.13            
 [13] farver_2.1.0                ggrepel_0.9.1              
 [15] fansi_0.5.0                 mvtnorm_1.1-3              
 [17] lubridate_1.8.0             coin_1.4-2                 
 [19] xml2_1.3.2                  codetools_0.2-18           
 [21] splines_4.0.5               libcoin_1.0-9              
 [23] jsonlite_1.7.2              broom_0.7.9                
 [25] cluster_2.1.2               dbplyr_2.1.1               
 [27] compiler_4.0.5              httr_1.4.2                 
 [29] backports_1.2.1             assertthat_0.2.1           
 [31] Matrix_1.3-2                lazyeval_0.2.2             
 [33] cli_3.1.0                   gtable_0.3.0               
 [35] glue_1.4.2                  GenomeInfoDbData_1.2.4     
 [37] Rcpp_1.0.7                  carData_3.0-4              
 [39] Biobase_2.50.0              cellranger_1.1.0           
 [41] vctrs_0.3.8                 Biostrings_2.58.0          
 [43] ape_5.5                     nlme_3.1-153               
 [45] gghalves_0.1.1              iterators_1.0.13           
 [47] ggalluvial_0.12.3           openxlsx_4.2.4             
 [49] rvest_1.0.2                 lifecycle_1.0.1            
 [51] rstatix_0.7.0               zlibbioc_1.36.0            
 [53] MASS_7.3-54                 zoo_1.8-9                  
 [55] scales_1.1.1                ragg_1.2.0                 
 [57] hms_1.1.1                   MatrixGenerics_1.2.1       
 [59] parallel_4.0.5              SummarizedExperiment_1.20.0
 [61] sandwich_3.0-1              curl_4.3.2                 
 [63] gridExtra_2.3               dtplyr_1.1.0               
 [65] ggfun_0.0.4                 yulab.utils_0.0.4.901      
 [67] stringi_1.7.4               S4Vectors_0.28.1           
 [69] foreach_1.5.1               tidytree_0.3.5             
 [71] permute_0.9-5               BiocGenerics_0.36.1        
 [73] zip_2.2.0                   GenomeInfoDb_1.26.7        
 [75] systemfonts_1.0.3           rlang_0.4.12               
 [77] pkgconfig_2.0.3             matrixStats_0.61.0         
 [79] bitops_1.0-7                lattice_0.20-44            
 [81] labeling_0.4.2              treeio_1.19.0              
 [83] patchwork_1.1.1             tidyselect_1.1.1           
 [85] magrittr_2.0.1              R6_2.5.1                   
 [87] IRanges_2.24.1              generics_0.1.1             
 [89] multcomp_1.4-17             DelayedArray_0.16.3        
 [91] DBI_1.1.1                   pillar_1.6.4               
 [93] haven_2.4.3                 foreign_0.8-81             
 [95] withr_2.4.2                 mgcv_1.8-36                
 [97] abind_1.4-5                 survival_3.2-13            
 [99] RCurl_1.98-1.5              modelr_0.1.8               
[101] crayon_1.4.2                car_3.0-11                 
[103] utf8_1.2.2                  tzdb_0.1.2                 
[105] grid_4.0.5                  readxl_1.3.1               
[107] data.table_1.14.2           vegan_2.5-7                
[109] digest_0.6.28               reprex_2.0.1               
[111] textshaping_0.3.4           gridGraphics_0.5-1         
[113] stats4_4.0.5                munsell_0.5.0

Dec 01 '21 06:12 alienzj

I think it can work by updating ggtreeExtra. Ps: you also can check whether the RelAbundanceBySample is exits when you unnest the treedata.

> tr.da %>% as_tibble %>% colnames()
 [1] "parent"                       "node"
 [3] "label"                        "nodeClass"
 [5] "nodeDepth"                    "AbundanceBySample"
 [7] "AbundanceByendoscopy_type"    "AbundanceBygroup"
 [9] "AbundanceByparticle_size"     "RelAbundanceBySampleBySample"
[11] "LDAupper"                     "LDAmean"
[13] "LDAlower"                     "Sign_group"
[15] "pvalue"                       "fdr"
> tr.da %>% unnest(AbundanceBySample) %>% colnames()
# A tbl_df is returned for independent data analysis.
 [1] "node"                         "label"
 [3] "isTip"                        "nodeClass"
 [5] "nodeDepth"                    "Sample"
 [7] "endoscopy_type"               "group"
 [9] "particle_size"                "Abundance"
[11] "RelAbundanceBySample"         "AbundanceByendoscopy_type"
[13] "AbundanceBygroup"             "AbundanceByparticle_size"
[15] "RelAbundanceBySampleBySample" "LDAupper"
[17] "LDAmean"                      "LDAlower"
[19] "Sign_group"                   "pvalue"
[21] "fdr"

This is my R session info

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /mnt/d/UbuntuApps/R/4.1.1/lib/R/lib/libRblas.so
LAPACK: /mnt/d/UbuntuApps/R/4.1.1/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] forcats_0.5.1               ggplot2_3.3.5
[3] ggstar_1.0.2                MicrobiotaProcess_1.7.3.990
[5] ggtree_3.3.0.900            ggtreeExtra_1.5.1

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.6.0        Biobase_2.54.0
 [3] tidyr_1.1.4                 jsonlite_1.7.2
 [5] splines_4.1.1               foreach_1.5.1
 [7] assertthat_0.2.1            stats4_4.1.1
 [9] yulab.utils_0.0.4.901       coin_1.4-2
[11] GenomeInfoDbData_1.2.7      ggrepel_0.9.1
[13] pillar_1.6.4                lattice_0.20-45
[15] glue_1.5.0                  GenomicRanges_1.46.0
[17] XVector_0.34.0              ggsignif_0.6.3
[19] colorspace_2.0-2            sandwich_3.0-1
[21] ggfun_0.0.4                 Matrix_1.3-4
[23] ggnewscale_0.4.5            pkgconfig_2.0.3
[25] zlibbioc_1.40.0             purrr_0.3.4
[27] mvtnorm_1.1-3               patchwork_1.1.1
[29] tidytree_0.3.6              scales_1.1.1
[31] ggplotify_0.1.0             tibble_3.1.6
[33] mgcv_1.8-38                 generics_0.1.1
[35] IRanges_2.28.0              ellipsis_0.3.2
[37] withr_2.4.2                 TH.data_1.1-0
[39] SummarizedExperiment_1.24.0 BiocGenerics_0.40.0
[41] lazyeval_0.2.2              cli_3.1.0
[43] survival_3.2-13             magrittr_2.0.1
[45] crayon_1.4.2                fansi_0.5.0
[47] nlme_3.1-153                MASS_7.3-54
[49] vegan_2.5-7                 tools_4.1.1
[51] lifecycle_1.0.1             matrixStats_0.61.0
[53] multcomp_1.4-17             aplot_0.1.1
[55] S4Vectors_0.32.0            munsell_0.5.0
[57] cluster_2.1.2               DelayedArray_0.20.0
[59] Biostrings_2.62.0           compiler_4.1.1
[61] GenomeInfoDb_1.30.0         gridGraphics_0.5-1
[63] rlang_0.4.12                grid_4.1.1
[65] RCurl_1.98-1.5              iterators_1.0.13
[67] bitops_1.0-7                gtable_0.3.0
[69] codetools_0.2-18            DBI_1.1.1
[71] R6_2.5.1                    gridExtra_2.3
[73] zoo_1.8-9                   dplyr_1.0.7
[75] utf8_1.2.2                  libcoin_1.0-9
[77] treeio_1.18.0               permute_0.9-5
[79] ape_5.5-2                   modeltools_0.2-23
[81] parallel_4.1.1              Rcpp_1.0.7
[83] vctrs_0.3.8                 tidyselect_1.1.1

Dec 01 '21 08:12 xiangpin

Thanks so much ! I updated ggtreeExtra, now it work.

Dec 01 '21 08:12 alienzj

Hi, I am running biomarker identification step deres <- diff_analysis(obj = ps, classgroup = "Treatment", mlfun = "lda", filtermod = "pvalue", firstcomfun = "kruskal_test", firstalpha = 0.05, strictmod = TRUE, secondcomfun = "wilcox_test", subclmin = 3, subclwilc = TRUE, secondalpha = 0.01, lda=3) and am getting Error in lda.default(x, grouping, ...) : variables 33 37 appear to be constant within groups.

Is there a way how to filter out the constant variables? I am limited in my R skills and suggestions would be welcome :) Thanks, Ilma

Jan 25 '22 13:01 ilma74

@ilma74 hello, which version of MicrobiotaProcess are you using? I cannot reproduce your issue. Would you mind sending me the dataset by email? I will not distribute this data without your permission. My email is [email protected]

Jan 26 '22 06:01 xiangpin

MicrobiotaProcess MicrobiotaProcess copied to clipboard

Biomarker discovery explanation and new package's feature discussion

MicrobiotaProcess
MicrobiotaProcess copied to clipboard