dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Assign Taxonomy for V3-V4 region

Open microb123 opened this issue 1 year ago • 2 comments

Hi

I am trying to assign taxonomy for my V3-V4 sequencing following your tutorial https://benjjneb.github.io/dada2/tutorial.html. While using silva_nr99_v138.1_wSpecies_train_set.fa.gz database resulted more species assignment than using silva_nr99_v138.1_train_set.fa.gz followed by silva_species_assignment_v138.1.fa.gz.

Issue thread #1256 suggested to use silva_nr99_v138.1_wSpecies_train_set.fa.gz for long reads and silva_nr99_v138.1_train_set.fa.gz followed by silva_species_assignment_v138.1.fa.gz for standard short read. I am wondering which one should I use for my V3-V4 target which is 427 nucleotides long.

Also, zonodo page has note stating problem with some family and genera. Is it still okay to use the given database file?

“NOTE: These database files have a known problem in 3/895 families and 59/3936 genera. See https://github.com/mikemc/dada2-reference-databases/blob/main/silva-138.1/v1/bad-taxa.csv for a list of affected taxa and https://github.com/benjjneb/dada2/issues/1293 for more information”

Thank you

microb123 avatar Jul 28 '22 07:07 microb123

Because of the limited information available in short-read 16S, we recommend the more conservative species-level assignmment method (exact, unambiguous matching) implemented in assignSpecies for such data.

It is still OK to use the Silva files. Post hoc, if any interesting taxa crop up, it can be useful to check them against the list of affected taxa. The issue there isn't that they are totally wrong, it's just that those taxa had a higher taxonomic level unassigned by Silva, and hence have their e.g. genus in the Family position after running assignTaxonomy.

benjjneb avatar Jul 28 '22 17:07 benjjneb

Thank you for your suggestions.

microb123 avatar Jul 29 '22 18:07 microb123