sourmash
sourmash copied to clipboard
Splitting table with strain-level resolution (R code)
Hi @taylorreiter,
I recently tried out your tutorial to run Sourmash Gather and import the output into a phyloseq object (discovered during STAMPS!). It worked out super well on the first try!
However, my second try was using Genebank db, which seems to allow strain-level identifications, contrary to GTDB. This resulted in an eight "piece" which the R script wouldn't handle when separating the lineages.
Turns out simply adding the fill="right"
option to separate allows adding a "strains" to the list without getting an error, if you ever want to update the tutorial:
tax_table <- sourmash_taxonomy_results %>%
select(name, lineage) %>%
distinct() %>%
separate(lineage, into = c("domain", "phylum", "class", "order", "family", "genus", "species","strain"), sep = ";", fill="right) %>%
column_to_rownames("name") %>%
select_if(~sum(!is.na(.)) > 0)
Also, I noticed the barplot code is not there, the ps object creation command is simply copy-pasted there!
Thanks again for this awesome tutorial, it saved me hours of work.
EDIT :
added a line to drop the strain column if it's only NAs (so the code works with and without strain-level resolution)
I realised we can aggregate to species level using sourmash tax metagenome , so my first remark might not be as relevant as I thought.
thank you reporting this!