phyloFlash icon indicating copy to clipboard operation
phyloFlash copied to clipboard

Incomplete taxonomic paths in .phyloFlash.NTUfull_abundance.csv output

Open chassenr opened this issue 1 year ago • 3 comments

Hi @HRGV and @kbseah ,

I have been using phyloFalsh to explore the eukaryotic component of some metagenomes and noticed that the taxonomic paths in the .phyloFlash.NTUfull_abundance.csv output table are all truncated to 7 levels, which is not sufficient for eukaryotes. In my particular example, I am interested in the taxonomic composition of Chytridiomycota (fungi), but the taxonomic path is not further resolved beyond this level (phylum). Is there a quick fix for this that I can implement myself? Are you planning to change this in upcoming phyloFlash versions? I know that eukaryotic taxonomic paths are a nightmare (especially if you want to align them with prokaryotic ones), but maybe the tax_slv_ssu_138.1.txt file will be helpful to pick a corresponding set of taxonomic ranks for both prokaryotes and eukaryotes in the output?

Thanks!

Cheers, Christiane

chassenr avatar Jun 20 '23 08:06 chassenr

Hi Christiane, thanks for pointing this out. As you note this is a tricky issue because of the longer taxonomic paths for eukaryotic paths and their inconsistent lengths in the SILVA taxonomy (and the NCBI taxonomy too).

One possibility I see is to use the PR2 taxonomy paths instead, which are standardized to 9 levels: https://pr2database.github.io/pr2database/articles/pr2_02A_silva.html

I haven't checked though what fraction of the SILVA eukaryotic sequences also appear in PR2. Some groups may not be represented in PR2 because they rely on expert curation for specific taxonomic groups.

Can't make any promises about when a new phyloFlash version will come out. As a stop-gap we could work on a SILVA database with modified taxonomy paths. Will keep this in mind

kbseah avatar Jun 20 '23 13:06 kbseah

Hi @kbseah Thanks for your fast reply. Is there maybe a way to work with the existing phyloflash output and maybe just parse the sam file differently to create the NTU table with the complete paths (independent of phyloflash)? Just as a quick fix? I tried to identify the corresponding code in the perl scripts, but since I am not a perl person that was a bit difficult for me...

chassenr avatar Jun 21 '23 09:06 chassenr

Hi Christiane, I think the best option for now is to simply parse the SAM file. They contain the SILVA accessions and header lines, which include the taxonomy paths, which you can the summarize at the level you wish.

kbseah avatar Jun 23 '23 09:06 kbseah