microeco icon indicating copy to clipboard operation
microeco copied to clipboard

subsetting taxa

Open apoosakkannu opened this issue 2 years ago • 15 comments

Hi, i wonder could it be possible to subset a object using specific taxa? Please let me know! thanks in advance!

apoosakkannu avatar Sep 09 '22 13:09 apoosakkannu

Hi. Could you take a toy example?

ChiLiubio avatar Sep 10 '22 07:09 ChiLiubio

Hi, Something like below example in phyloseq object, GP.chl = subset_taxa(GlobalPatterns, Phylum=="Chlamydiae")

apoosakkannu avatar Sep 11 '22 10:09 apoosakkannu

Hi. Please try to use the following steps:

dataset$tax_table %<>% subset(Phylum=="Chlamydiae")
dataset$tidy_dataset()

I think the R subset function is enough to take the subset operation, so I do not add a function like subset_taxa of phyloseq package. I will consider your example. Thanks.

Chi

ChiLiubio avatar Sep 12 '22 00:09 ChiLiubio

Thanks. It kind of worked, but i have problem when i do the differential abundance analysis. I have the following codes for my analysis in which when i subset the dataset to a specific taxa and converted into dataset1. But when do differential analysis, it is showing some other taxa too. I have attached dataset for your reference. Please give me a n idea where it goes wrong!

##upload all the data files

#upload the OTU abundance file abund_table<-read.csv("/scratch/project_2003832/pacbio_rawdata/bird_bat/stat/decontam_all_dataset/otu_table_decontam_all.csv",row.names=1,check.names=FALSE)

#upload the meta data file meta_table<-read.csv("/scratch/project_2003832/pacbio_rawdata/bird_bat/stat/decontam_all_dataset/sample_table_decontam_all.csv",row.names=1,check.names=FALSE)

#convert the meta data variables as factors col_names <- names(meta_table) meta_table[,col_names] <- lapply(meta_table[,col_names] , factor)

#upload the tree #require "ape" package library(ape) #version 5.5

OTU_tree <- read.tree("/scratch/project_2003832/pacbio_rawdata/bird_bat/stat/decontam_all_dataset/phylo_tree_decontam_all.tre")

#load the taxonomy OTU_taxonomy<-read.csv("/scratch/project_2003832/pacbio_rawdata/bird_bat/stat/decontam_all_dataset/tax_table_decontam_all.csv",row.names=1,check.names=FALSE)

make the taxonomic information unified, very important #require "tidytree" and "microeco" packages

library(tidytree) #version 0.3.5 library(microeco) #version 0.11.0

OTU_taxonomy %<>% tidy_taxonomy

##create and cleanup data object #The packages "microeco" and "tidytree" uploaded already

create a dataset of all the data uploaded

dataset <- microtable$new(sample_table = meta_table, otu_table = abund_table, tax_table = OTU_taxonomy, phylo_tree = OTU_tree)

#make the OTU and sample information consistent across all files in the dataset object dataset$tidy_dataset()

dataset

#calculate alpha diversity dataset$cal_alphadiv(PD = TRUE)

#calculate beta diversity dataset$cal_betadiv(unifrac = TRUE)

#calculate the abundance dataset$cal_abund()

save(dataset, file = "dataset.RData")

##select only Campylobacter taxa dataset1 <- clone(dataset) dataset1$tax_table %<>% subset(Genus == "g__Campylobacter") dataset1$tidy_dataset() dataset1

#identify the differential abundance taxa for 0.05% relative abundace t4 <- trans_diff$new(dataset = dataset1, method = "KW_dunn", group = "Host_taxa", taxa_level = "OTU", filter_thres = 0.005) t4$res_diff dataset.rdata.zip

apoosakkannu avatar Sep 12 '22 09:09 apoosakkannu

Hi. You need perform dataset1$cal_abund() before the differential test. With this operation, the taxa_abund can be updated for the use of trans_diff.

ChiLiubio avatar Sep 14 '22 00:09 ChiLiubio

Thanks, but it will be not original abundance right? because i subset only specific taxa! Please let me know if i can just plot taxa g__Campylobacter without any subset?

apoosakkannu avatar Sep 14 '22 06:09 apoosakkannu

Hi. I got it! Please run the following steps and check whether it is correct.

# add OTU as a taxonomic level in tax_table
dataset$add_rownames2taxonomy(use_name = "OTU")
# cal_abund can return taxa_abund with the last data.frame named OTU
dataset$cal_abund()
# select some
dataset$taxa_abund$OTU %<>% .[grepl("g__Campylobacter", rownames(.)), ]
# run
 t4 <- trans_diff$new(dataset = dataset1, method = "KW_dunn", group = "Host_taxa", taxa_level = "OTU", filter_thres = 0.005)

ChiLiubio avatar Sep 15 '22 11:09 ChiLiubio

Thanks. it worked.

apoosakkannu avatar Sep 15 '22 12:09 apoosakkannu

Hi, I wonder could it be possible to subset some taxa without affecting their original abundance or count for the alpha and beta diversity analysis.

apoosakkannu avatar Nov 15 '22 13:11 apoosakkannu

Hi. Could you please explain it with more details? I didnot get you.

ChiLiubio avatar Nov 16 '22 01:11 ChiLiubio

I need to get the microeo obiect data only for following potential pathogenic genus, g__Acinetobacter|g__Aeromonas|g__Anaplasma|g__Bacillus|g__Bartonella|g__Borrelia|g__Campylobacter|g__Chlamydia|g__Citrobacter|g__Clostridiodes|g__Clostridium|g__Corynebacterium|g__Coxiella|g__Ehrilchia|g__Enterobacter|g__Enterococcus|g__Escherichia|g__Francisella|g__Klebsiella|g__Listeria|g__Mycobacterium|g__Mycoplasma|g__Neoehrilchia|g__Pasteurella|g__Proteus|g__Pseudomonas|g__Rickettsia|g__Salmonella|g__Serratia|g__Staphylococcus|g__Streptococcus|g__Vibrio|g__Yersinia'

and then proceed with normal analysis such as alpha and beta diversity.

On Wed, Nov 16, 2022 at 3:48 AM Chi Liu @.***> wrote:

Hi. Could you please explain it with more details? I didnot get you.

— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/143#issuecomment-1316164465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMC3WZSIUBCZ66VBP7TE4HDWIQ4QBANCNFSM6AAAAAAQIWGD2U . You are receiving this because you authored the thread.Message ID: @.***>

apoosakkannu avatar Nov 16 '22 07:11 apoosakkannu

actually something like here in the link, https://github.com/joey711/phyloseq/issues/1048

On Wed, Nov 16, 2022 at 9:05 AM Anbu Poosakkannu @.***> wrote:

I need to get the microeo obiect data only for following potential pathogenic genus, g__Acinetobacter|g__Aeromonas|g__Anaplasma|g__Bacillus|g__Bartonella|g__Borrelia|g__Campylobacter|g__Chlamydia|g__Citrobacter|g__Clostridiodes|g__Clostridium|g__Corynebacterium|g__Coxiella|g__Ehrilchia|g__Enterobacter|g__Enterococcus|g__Escherichia|g__Francisella|g__Klebsiella|g__Listeria|g__Mycobacterium|g__Mycoplasma|g__Neoehrilchia|g__Pasteurella|g__Proteus|g__Pseudomonas|g__Rickettsia|g__Salmonella|g__Serratia|g__Staphylococcus|g__Streptococcus|g__Vibrio|g__Yersinia'

and then proceed with normal analysis such as alpha and beta diversity.

On Wed, Nov 16, 2022 at 3:48 AM Chi Liu @.***> wrote:

Hi. Could you please explain it with more details? I didnot get you.

— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/143#issuecomment-1316164465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMC3WZSIUBCZ66VBP7TE4HDWIQ4QBANCNFSM6AAAAAAQIWGD2U . You are receiving this because you authored the thread.Message ID: @.***>

apoosakkannu avatar Nov 16 '22 10:11 apoosakkannu

Hi. You can directly manipulate the tax_table in your microtable object. Like this

library(microeco)
data(dataset)
newdata <- clone(dataset)
# use this if your_selected_genera is a vector with each genus as an element
newdata$tax_table <- subset(newdata$tax_table,  Genus %in% your_selected_genera)
# use this if all are in one string like you paste, use g__Acinetobacter|g__Aeromonas|g__Anaplasma as example
newdata$tax_table <- newdata$tax_table[grepl("g__Acinetobacter|g__Aeromonas|g__Anaplasma", newdata$tax_table$Genus), ]
# then trim
newdata$tidy_dataset()

ChiLiubio avatar Nov 16 '22 11:11 ChiLiubio

Thanks. The subseeting of taxa worked. But it did not affect the alpha and beta diversity calculations. They remain the same as original data.

On Wed, Nov 16, 2022 at 1:32 PM Chi Liu @.***> wrote:

Hi. You can directly manipulate the tax_table in your microtable object. Like this

library(microeco) data(dataset) newdata <- clone(dataset)

use this if your_selected_genera is a vector with each genus as an element

newdata$tax_table <- subset(newdata$tax_table, Genus %in% your_selected_genera)

use this if all are in one string like you paste, use g__Acinetobacter|g__Aeromonas|g__Anaplasma as example

newdata$tax_table <- newdata$tax_table[grepl("g__Acinetobacter|g__Aeromonas|g__Anaplasma", newdata$tax_table$Genus), ]

then trim

newdata$tidy_dataset()

— Reply to this email directly, view it on GitHub https://github.com/ChiLiubio/microeco/issues/143#issuecomment-1316851803, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMC3WZWT4UAC3JKRG6CKNKDWITA5TANCNFSM6AAAAAAQIWGD2U . You are receiving this because you authored the thread.Message ID: @.***>

apoosakkannu avatar Nov 16 '22 12:11 apoosakkannu

You should rerun the diversity calculations for the new data.

ChiLiubio avatar Nov 16 '22 15:11 ChiLiubio