dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

converting a fasta database to dada2 assignTaxonomy format

Open soluna1 opened this issue 1 year ago • 1 comments

Hello:

I'm trying to assign taxonomy with the MZGdb database, from https://www.st.nmfs.noaa.gov/copepod/collaboration/metazoogene/atlas/index.html

particularly COI and 18s db. The data are offered in fasta, csv, morthur and psv format.

Is there a way to convert any of them into a dada2 compatible db?

thanks in advance.

Sisi

soluna1 avatar Oct 11 '22 11:10 soluna1

The description of the fasta file format for use with assignTaxonomy is here: https://benjjneb.github.io/dada2/training.html#formatting-custom-databases

So if you can get the database into that format, then you are good to go. One thing to note, is that assignTaxonomy assumes a fixed set of taxonomic levels that is consistent across all entries. Some entries might only have the top few levels (e.g. no assignment at genus level or below), but it can't be the case that some entries have like a super-family level while others do not.

benjjneb avatar Oct 11 '22 15:10 benjjneb

How do you handle species with missing/non existend taxonomic classifications in the middle? Bacteria;Firmicutes;Clostridia;;Gracilibacteraceae;Lutispora; Bacteria;Firmicutes;Clostridia;NA;Gracilibacteraceae;Lutispora; Bacteria;Firmicutes;Clostridia;na;Gracilibacteraceae;Lutispora; Bacteria;Firmicutes;Clostridia;Clostridia_or;Gracilibacteraceae;Lutispora; Bacteria;Firmicutes;Clostridia;Gracilibacteraceae_or;Gracilibacteraceae;Lutispora;

My taxonomic assignment is flawed and wrong taxa are assigned and somethimes only matches at phylum level even when an exact match is present.

joran520 avatar Oct 25 '22 07:10 joran520

You can replace the absent/NA with "Clostridia_unclassified_order" or something similar.

salix-d avatar Oct 25 '22 15:10 salix-d