dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Possible misclassification using DADA2 to characterize jellyfish microbiome

Open NogaBarak opened this issue 2 years ago • 1 comments

Hi Ben,

I'm working on jellyfish microbiome and recently performed a primers comparison trying to find the best primers for me. I noticed that for what seems to be the same part of the community I get specific taxa (Lactobacteriaceae) in one primer (v1v2), whereas, in the other (v3v4/v4v5), I get something else (unclassified bacilli). I used the SILVA database for taxonomy identification. When I ran the ASVs of these taxonomies manually on SILVA and NCBI, I got that they are all Spiropalsma. I wonder why I got different results using the DADA2 command compared to my manual blast. I see that the matching sequences on NCBI do not exist on SILVA, and I know there is a lack of information on jellyfish bacteria.

So is it possible that the problem is related to the small number of sequences originating from jellyfish bacteria? If you have any way to help with the issue and would like more technical details, I would be happy to provide them.

Thank you very much,

Noga

NogaBarak avatar Apr 06 '22 07:04 NogaBarak

So is it possible that the problem is related to the small number of sequences originating from jellyfish bacteria?

Yes, that is likely. assignTaxonomy is implementing the naive Bayesian classifier method (see original paper: https://doi.org/10.1128/AEM.00062-07). In my experience the primary way this algorithm "breaks" is when there is nothing similar to the query sequence in the reference database. Given the paucity of research on jellyfish microbiomes, and perhaps mollicutes in general as well, this seems the most likely explanation here. One way to check further would be to BLAST (or the like) the relevant sequences against the Silva fasta file, and see if they hit anything very close.

If you have any way to help with the issue and would like more technical details, I would be happy to provide them.

I think you are already kind of doing the right thing, looking with a critical eye at your results and digging in deeper where appropriate. BLAST by hand is a useful tool I also use for specific ASVs when I want to increase my confidence in their classification.

I don't have an easy technical solution, however. Augmenting the database would help, but that brings up all sorts of complications on where to get the appropriate reference sequences for your environment. You could also consider the IdTaxa method described as an alternative in the DADA2 tutorial. That method should be less likely to "overclassify" sequences that are distant from the database, but won't fix the underlying problem of incomplete references.

benjjneb avatar Apr 07 '22 18:04 benjjneb