dada2
dada2 copied to clipboard
converting a fasta database to dada2 assignTaxonomy format
Hello:
I'm trying to assign taxonomy with the MZGdb database, from https://www.st.nmfs.noaa.gov/copepod/collaboration/metazoogene/atlas/index.html
particularly COI and 18s db. The data are offered in fasta, csv, morthur and psv format.
Is there a way to convert any of them into a dada2 compatible db?
thanks in advance.
Sisi
The description of the fasta file format for use with assignTaxonomy
is here: https://benjjneb.github.io/dada2/training.html#formatting-custom-databases
So if you can get the database into that format, then you are good to go. One thing to note, is that assignTaxonomy
assumes a fixed set of taxonomic levels that is consistent across all entries. Some entries might only have the top few levels (e.g. no assignment at genus level or below), but it can't be the case that some entries have like a super-family level while others do not.
How do you handle species with missing/non existend taxonomic classifications in the middle? Bacteria;Firmicutes;Clostridia;;Gracilibacteraceae;Lutispora; Bacteria;Firmicutes;Clostridia;NA;Gracilibacteraceae;Lutispora; Bacteria;Firmicutes;Clostridia;na;Gracilibacteraceae;Lutispora; Bacteria;Firmicutes;Clostridia;Clostridia_or;Gracilibacteraceae;Lutispora; Bacteria;Firmicutes;Clostridia;Gracilibacteraceae_or;Gracilibacteraceae;Lutispora;
My taxonomic assignment is flawed and wrong taxa are assigned and somethimes only matches at phylum level even when an exact match is present.
You can replace the absent/NA with "Clostridia_unclassified_order" or something similar.