phyloseq how to find ASVs present in different "sites"

how to find ASVs present in different "sites"

Open callmcgovern opened this issue 1 year ago • 9 comments

Hello, I am trying to look at the ASVs that are similar between different sites. I have sites in my sample_data and I have been able to make a venn diagram of overlapping vs non-overlapping ASVs in different sites. However, I would now like to look at a list of those ASVs which are shared by two particular samples. Does anyone know how I could do this? Thanks!

Jan 27 '23 21:01 callmcgovern

> ps.rare
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2864 taxa and 893 samples ]
sample_data() Sample Data:       [ 893 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 2864 taxa by 7 taxonomic ranks ]
> 
> ps.NY <- subset_samples(ps.rare, site == "NY")
> ps.NY<- prune_taxa(taxa_sums(ps.NY) > 0, ps.NY) # prune OTUs that are not present in at least one sample
> 
> ps.NJ <- subset_samples(ps.rare, site == "NJ")
> ps.NJ <- prune_taxa(taxa_sums(ps.NJ) > 0, ps.NJ) # prune OTUs that are not present in at least one sample
> 
> ps.NY
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 951 taxa and 349 samples ]
sample_data() Sample Data:       [ 349 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 951 taxa by 7 taxonomic ranks ]
> ps.NJ
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1141 taxa and 204 samples ]
sample_data() Sample Data:       [ 204 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 1141 taxa by 7 taxonomic ranks ]
> 
> length(taxa_names(ps.NY))
[1] 951
> length(taxa_names(ps.NJ))
[1] 1141
> length(intersect(taxa_names(ps.NY), taxa_names(ps.NJ)))
[1] 317
>write.csv(intersect(taxa_names(ps.NY), taxa_names(ps.NJ)), "shared_asvs_NY_NJ.csv")

Feb 02 '23 18:02 jeremysutherland

Hi, thank you for your help! How would you do this if you wanted to find ASVs that are found in all of your samples in the phyloseq object?

May 15 '23 18:05 callmcgovern

nrow(ps@otu_table) ps.prune <- prune_taxa(taxa_sums(ps) > 0, ps) # prune ASVs that are not present in at least one sample nrow(ps.prune@otu_table)

May 15 '23 19:05 jeremysutherland

Hi Jeremy, thanks for your quick reply. This does not seem to work. Here is the code and output on my end:

nrow(ps_dolphin@otu_table) [1] 15

ps_dolphin phyloseq-class experiment-level object otu_table() OTU Table: [ 1899 taxa and 15 samples ] sample_data() Sample Data: [ 15 samples by 4 sample variables ] tax_table() Taxonomy Table: [ 1899 taxa by 7 taxonomic ranks ] refseq() DNAStringSet: [ 1899 reference sequences ]

ps.prune <- prune_taxa(taxa_sums(ps_dolphin) > 0, ps_dolphin) nrow(ps.prune@otu_table) [1] 15

ps.prune phyloseq-class experiment-level object otu_table() OTU Table: [ 1899 taxa and 15 samples ] sample_data() Sample Data: [ 15 samples by 4 sample variables ] tax_table() Taxonomy Table: [ 1899 taxa by 7 taxonomic ranks ] refseq() DNAStringSet: [ 1899 reference sequences ]

Any ideas as to why it wouldnt be working?

May 15 '23 20:05 callmcgovern

Could be that you don't have any ASVs with all zero counts. (All ASVs have at least 1 count in at least one sample).

This should tell you if thats the case. (Assuming samples are rownames)

sum(colSums(ps_dolphin@otu_table)<1)

May 15 '23 21:05 kodelogue

ope. Kodelogue is my other (fun) account. but that should do the trick.

May 16 '23 15:05 jeremysutherland

Hi, I'm new to Github. I found this answer that was very helpful for my project:

> ps.rare
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 2864 taxa and 893 samples ]
sample_data() Sample Data:       [ 893 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 2864 taxa by 7 taxonomic ranks ]
> 
> ps.NY <- subset_samples(ps.rare, site == "NY")
> ps.NY<- prune_taxa(taxa_sums(ps.NY) > 0, ps.NY) # prune OTUs that are not present in at least one sample
> 
> ps.NJ <- subset_samples(ps.rare, site == "NJ")
> ps.NJ <- prune_taxa(taxa_sums(ps.NJ) > 0, ps.NJ) # prune OTUs that are not present in at least one sample
> 
> ps.NY
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 951 taxa and 349 samples ]
sample_data() Sample Data:       [ 349 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 951 taxa by 7 taxonomic ranks ]
> ps.NJ
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 1141 taxa and 204 samples ]
sample_data() Sample Data:       [ 204 samples by 10 sample variables ]
tax_table()   Taxonomy Table:    [ 1141 taxa by 7 taxonomic ranks ]
> 
> length(taxa_names(ps.NY))
[1] 951
> length(taxa_names(ps.NJ))
[1] 1141
> length(intersect(taxa_names(ps.NY), taxa_names(ps.NJ)))
[1] 317
>write.csv(intersect(taxa_names(ps.NY), taxa_names(ps.NJ)), "shared_asvs_NY_NJ.csv")

I wanted to ask: How could I obtain that same list of shared ASVs between both groups but also associated with the taxonomic classification found in the tax_table? In other words, I want to obtain a table (like the one in the attached image) that shows each shared ASV (for example: ASV1, ASV5, ASV30, etc.) and what microorganism it represents. Thank you very much. Bacteria

Feb 08 '24 14:02 mailasb

I think* this should work:

asv_list <- intersect(taxa_names(ps.NY), taxa_names(ps.NJ))

taxa_sub <- ps.rare@tax_table[rownames(ps.rare@tax_table) %in% asv_list, ]

Feb 16 '24 14:02 jeremysutherland

It worked perfectly. Thank you very much!

Feb 26 '24 22:02 mailasb

phyloseq phyloseq copied to clipboard

how to find ASVs present in different "sites"

phyloseq
phyloseq copied to clipboard