MicrobiotaProcess icon indicating copy to clipboard operation
MicrobiotaProcess copied to clipboard

can't calculate metaphlan's output

Open shaodongyan opened this issue 3 years ago • 6 comments

when i do mpse3 <- mp_import_metaphlan(profile="merged_abundance_table.txt", mapfilename="meta.txt") mpse3 %<>% mp_rrarefy() This is the output Error in vegan::rrarefy(x = .data, sample = raresize) : function is meaningful only for integers (counts)

shaodongyan avatar Nov 25 '21 08:11 shaodongyan

please refer to this issues. I recommend using the latest version, because some important features were added, you can use remotes::install_github("YuLab-SMU/MicrobiotaProcess").

xiangpin avatar Nov 25 '21 08:11 xiangpin

请参考这个问题。我建议使用最新版本,因为添加了一些重要功能,您可以使用remotes::install_github("YuLab-SMU/MicrobiotaProcess").

thankyou!So all the orders I used mp_*?

shaodongyan avatar Nov 25 '21 08:11 shaodongyan

Yes, you can also refer to the vignette.

xiangpin avatar Nov 25 '21 08:11 xiangpin

sorry,i have updated MicrobiotaProcess from git,But it can'work like below

是的,您也可以参考小插图

shaodongyan avatar Nov 25 '21 08:11 shaodongyan

What is the error, please provide the example. As mentioned in the issues. The mp_rrarfy need your .abundace should be integers (count), but the output of metaphlan3 is the relative abundance (adjusted by the genome length and the sample depth), you can multiple the sample depth, then round to keep the integer part (this is also the solution of curatedMetagenomicData). Or you can use the relative abundance to perform the following analysis directly with force=TRUE in some functions (?mp_cal_alpha to see the help information of the function). This means the specified .abundance will be used to calculate directly not be rarefied (default)

xiangpin avatar Nov 25 '21 09:11 xiangpin

The mpse3 is from the output of MetaPhlAn3

mpse3.obj.zip

> mpse3
# A MPSE-tibble (MPSE object) abstraction: 33,464 × 36
# OTU=356 | Samples=94 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU        Sample Abundance study_name subject_id body_site antibiotics_curr…
   <chr>      <chr>      <dbl> <chr>      <chr>      <chr>     <chr>
 1 s__Fusica… S1_a_…     23.2  IjazUZ_20… subIJ_1    stool     no
 2 s__Bifido… S1_a_…     16.7  IjazUZ_20… subIJ_1    stool     no
 3 s__Collin… S1_a_…     11.9  IjazUZ_20… subIJ_1    stool     no
 4 s__Rumino… S1_a_…     10.9  IjazUZ_20… subIJ_1    stool     no
 5 s__Bifido… S1_a_…     10.7  IjazUZ_20… subIJ_1    stool     no
 6 s__Bifido… S1_a_…      5.77 IjazUZ_20… subIJ_1    stool     no
 7 s__Anaero… S1_a_…      4.03 IjazUZ_20… subIJ_1    stool     no
 8 s__Dorea_… S1_a_…      3.81 IjazUZ_20… subIJ_1    stool     no
 9 s__Agatho… S1_a_…      2.15 IjazUZ_20… subIJ_1    stool     no
10 s__Eggert… S1_a_…      1.74 IjazUZ_20… subIJ_1    stool     no
# … with 33,454 more rows, and 29 more variables: study_condition <chr>,
#   disease <chr>, age <int>, age_category <chr>, gender <chr>, country <chr>,
#   non_westernized <chr>, sequencing_platform <chr>, PMID <chr>,
#   number_reads <int>, number_bases <dbl>, minimum_read_length <int>,
#   median_read_length <int>, NCBI_accession <chr>, curator <chr>,
#   family <chr>, days_from_first_collection <int>, family_role <chr>,
#   disease_subtype <chr>, treatment <chr>, disease_location <chr>, …

First, you can calculate the relative abundance for the sample or group level

> mpse3 %<>% mp_cal_abundance(.abundance=Abundance, force=TRUE, relative=TRUE) %>% mp_cal_abundance(.abundance=Abundance, .group=disease, force=TRUE, relative=TRUE)
> mpse3
# A MPSE-tibble (MPSE object) abstraction: 33,464 × 37
# OTU=356 | Samples=94 | Assays=Abundance, RelAbundanceBySample | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU        Sample  Abundance RelAbundanceByS… study_name subject_id body_site
   <chr>      <chr>       <dbl>            <dbl> <chr>      <chr>      <chr>
 1 s__Fusica… S1_a_W…     23.2             23.2  IjazUZ_20… subIJ_1    stool
 2 s__Bifido… S1_a_W…     16.7             16.7  IjazUZ_20… subIJ_1    stool
 3 s__Collin… S1_a_W…     11.9             11.9  IjazUZ_20… subIJ_1    stool
 4 s__Rumino… S1_a_W…     10.9             10.9  IjazUZ_20… subIJ_1    stool
 5 s__Bifido… S1_a_W…     10.7             10.7  IjazUZ_20… subIJ_1    stool
 6 s__Bifido… S1_a_W…      5.77             5.77 IjazUZ_20… subIJ_1    stool
 7 s__Anaero… S1_a_W…      4.03             4.03 IjazUZ_20… subIJ_1    stool
 8 s__Dorea_… S1_a_W…      3.81             3.82 IjazUZ_20… subIJ_1    stool
 9 s__Agatho… S1_a_W…      2.15             2.15 IjazUZ_20… subIJ_1    stool
10 s__Eggert… S1_a_W…      1.74             1.74 IjazUZ_20… subIJ_1    stool
# … with 33,454 more rows, and 30 more variables:
#   antibiotics_current_use <chr>, study_condition <chr>, disease <chr>,
#   age <int>, age_category <chr>, gender <chr>, country <chr>,
#   non_westernized <chr>, sequencing_platform <chr>, PMID <chr>,
#   number_reads <int>, number_bases <dbl>, minimum_read_length <int>,
#   median_read_length <int>, NCBI_accession <chr>, curator <chr>,
#   family <chr>, days_from_first_collection <int>, family_role <chr>, …

Then you can perform the different analysis with mp_diff_analysis

> mpse3 %<>% mp_diff_analysis(.abundance=Abundance, force=TRUE, relative=TRUE, .group=disease)
> mpse3
# A MPSE-tibble (MPSE object) abstraction: 33,464 × 43
# OTU=356 | Samples=94 | Assays=Abundance, RelAbundanceBySample | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU        Sample  Abundance RelAbundanceByS… study_name subject_id body_site
   <chr>      <chr>       <dbl>            <dbl> <chr>      <chr>      <chr>
 1 s__Fusica… S1_a_W…     23.2             23.2  IjazUZ_20… subIJ_1    stool
 2 s__Bifido… S1_a_W…     16.7             16.7  IjazUZ_20… subIJ_1    stool
 3 s__Collin… S1_a_W…     11.9             11.9  IjazUZ_20… subIJ_1    stool
 4 s__Rumino… S1_a_W…     10.9             10.9  IjazUZ_20… subIJ_1    stool
 5 s__Bifido… S1_a_W…     10.7             10.7  IjazUZ_20… subIJ_1    stool
 6 s__Bifido… S1_a_W…      5.77             5.77 IjazUZ_20… subIJ_1    stool
 7 s__Anaero… S1_a_W…      4.03             4.03 IjazUZ_20… subIJ_1    stool
 8 s__Dorea_… S1_a_W…      3.81             3.82 IjazUZ_20… subIJ_1    stool
 9 s__Agatho… S1_a_W…      2.15             2.15 IjazUZ_20… subIJ_1    stool
10 s__Eggert… S1_a_W…      1.74             1.74 IjazUZ_20… subIJ_1    stool
# … with 33,454 more rows, and 36 more variables:
#   antibiotics_current_use <chr>, study_condition <chr>, disease <chr>,
#   age <int>, age_category <chr>, gender <chr>, country <chr>,
#   non_westernized <chr>, sequencing_platform <chr>, PMID <chr>,
#   number_reads <int>, number_bases <dbl>, minimum_read_length <int>,
#   median_read_length <int>, NCBI_accession <chr>, curator <chr>,
#   family <chr>, days_from_first_collection <int>, family_role <chr>, …
>

Finally, you can visualize the result of different analysis with mp_plot_diff_res

mpse3 %>%  mp_plot_diff_res(pwidth.abun=0.03, tiplab.size=1.2, offset.effsize=0.5)

xx

xiangpin avatar Nov 26 '21 02:11 xiangpin