MicrobiotaProcess
MicrobiotaProcess copied to clipboard
About mp_rrarefy and mp_cal_rarecurve
Hi, I want to analyze the results of metaphlan using MicrobiotaProcess, below is my code:
mpse2 <- mp_import_metaphlan(profile='metaphlan/treated_array.txt', mapfilename='metaphlan/adjmeta.txt')
mpse2
mpse2 %<>% mp_rrarefy()
mpse2 %<>% mp_cal_rarecurve(.abundance = RareAbundance,chunks = 400)
Treated_array.txt is a merged metaphlan result. I converted the relative abundance to absolute abundance according to the number of reads. [treated_array.txt](https:// adjmeta.txt github.com/YuLab-SMU/MicrobiotaProcess/files/11269250/treated_array.txt)
I found that the calculation of mp_rrarefy and mp_cal_rarecurve is very slow, mp_rrarefy took me about half an hour, as for mp_cal_rarecurve I waited for two hours and still didn't get the result (so I terminated the program). I don't quite understand why it's taking so much time, and how can I fix it?
Thanks, LeeLee
The treated_array.txt file does not seem to be uploaded well, I will upload it again treated_array.txt
It seems the absolute abundance is too big. The mp_rrarefy
is a wrapped function of vegan::rrarefy
to process MPSE
object.
I suggest that you can refer to the method (converted the relative abundance to absolute abundance) of curatedMetagenomicData
package. the method is round(relativate abundance * number of reads / 100)
> mpse <- mp_import_metaphlan('./treated_array.txt')
> mpse
# A MPSE-tibble (MPSE object) abstraction: 506 × 11
# OTU=46 | Samples=11 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Speies
OTU Sample Abundance taxid Kingdom Phylum Class Order Family Genus Speies
<chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 t__SGB… 14792… 507514084 2|12… k__Bac… p__Fi… c__B… o__L… f__La… g__L… s__La…
2 t__SGB… 14792… 356832609 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B… s__Bi…
3 t__SGB… 14792… 288544185 2|12… k__Bac… p__Fi… c__B… o__L… f__La… g__L… s__La…
4 t__SGB… 14792… 240335596 2|12… k__Bac… p__Pr… c__B… o__N… f__Ne… g__S… s__Sn…
5 t__SGB… 14792… 214801099 2|12… k__Bac… p__Fi… c__B… o__L… f__La… g__B… s__Bo…
6 t__SGB… 14792… 93028717 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B… s__Bi…
7 t__SGB… 14792… 90766276 2|12… k__Bac… p__Pr… c__G… o__O… f__Or… g__F… s__Fr…
8 t__SGB… 14792… 83527737 2|12… k__Bac… p__Fi… c__B… o__L… f__La… g__B… s__Bo…
9 t__SGB… 14792… 32917664 2|12… k__Bac… p__Pr… c__A… o__H… f__Ba… g__B… s__Ba…
10 t__SGB… 14792… 13227998 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B… s__Bi…
# ℹ 496 more rows
# ℹ Use `print(n = ...)` to see more rows
> mpse %>% mp_extract_assays(.abundance=Abundance) %>% colSums()
14792_metaphlan 14793_metaphlan 14794_metaphlan 14795_metaphlan 14796_metaphlan
1925809008 1294574335 2326667790 1975230225 1343337208
14797_metaphlan 14798_metaphlan 14815_metaphlan 14816_metaphlan 14817_metaphlan
1574659346 1625652607 1664513744 2181800294 1470224228
14818_metaphlan
2115662871
> mpse %>% dplyr::mutate(Abundance=round(Abundance/100)) %>% mp_rrarefy()
# A MPSE-tibble (MPSE object) abstraction: 506 × 12
# OTU=46 | Samples=11 | Assays=Abundance, RareAbundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Speies
OTU Sample Abundance RareAbundance taxid Kingdom Phylum Class Order Family
<chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 t__SG… 14792… 5075141 3411974 2|12… k__Bac… p__Fi… c__B… o__L… f__La…
2 t__SG… 14792… 3568326 2399304 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
3 t__SG… 14792… 2885442 1939746 2|12… k__Bac… p__Fi… c__B… o__L… f__La…
4 t__SG… 14792… 2403356 1614275 2|12… k__Bac… p__Pr… c__B… o__N… f__Ne…
5 t__SG… 14792… 2148011 1444179 2|12… k__Bac… p__Fi… c__B… o__L… f__La…
6 t__SG… 14792… 930287 625491 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
7 t__SG… 14792… 907663 610169 2|12… k__Bac… p__Pr… c__G… o__O… f__Or…
8 t__SG… 14792… 835277 561343 2|12… k__Bac… p__Fi… c__B… o__L… f__La…
9 t__SG… 14792… 329177 221201 2|12… k__Bac… p__Pr… c__A… o__H… f__Ba…
10 t__SG… 14792… 132280 88995 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
# ℹ 496 more rows
# ℹ 2 more variables: Genus <chr>, Speies <chr>
# ℹ Use `print(n = ...)` to see more rows
Yes, this greatly improves the speed of mp_rrarefy, but it seems to be still slow for mp_cal_rarecurve
mp_cal_rarecurve took around 8 hours for me. No multiple cores unfortunately.