dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

reference database for the function AssignTaxonomy()

Open madjus98 opened this issue 2 years ago • 3 comments

Dear Dr Callahan, I'm working with DADA2 for food authentication and microbiome analysis and I am seeking guidance on finding a suitable FASTA database for the assignTaxonomy() function. While I understand NCBI no longer provides pre-built FASTA databases, my experience with format conversion is limited. Any recommendations for finding a good database would be appreciated. Moreover, also SILVA databases could result limited for my purposes and I was not able to find new and adeguate version on the SILVA website.

I really appreciate your help!

madjus98 avatar Apr 10 '24 09:04 madjus98

A set of pre-built taxonomic reference databases for assignTaxonomy() are available here: https://benjjneb.github.io/dada2/training.html

If those meet your needs, then great. There is also a section on that page about how to format custom databases should that become a necessity.

benjjneb avatar Apr 11 '24 15:04 benjjneb

Thank you so much for your answer. Anyway I was wondering if converting those database available on NCBI or on SILVA to an assignTaxonomy() usable format was possible. Do you have any tutorial o guidlines for that?

Thank you so much

madjus98 avatar Apr 11 '24 16:04 madjus98

We have pre-formatted versions of Silva at the page linked above.

Do you have any tutorial or guidelines for that?

Nothing that rises to the level of a tutorial. The https://benjjneb.github.io/dada2/training.html#formatting-custom-databases section describes the required format for assignTaxonomy (which is pretty simple). There is also code in the dada2 R package to do this for Silva and RDP, but that code is more involved than will often be necessary because we do some QC on the underlying databases at the same time.

RDP code: https://github.com/benjjneb/dada2/blob/master/R/taxonomy.R#L382 Silva code: https://github.com/benjjneb/dada2/blob/master/R/taxonomy.R#L501

benjjneb avatar Apr 11 '24 16:04 benjjneb

Thank you so much for your reply. I belived that those databases proposed here: https://benjjneb.github.io/dada2/training.html#formatting-custom-databases were only for training. Do you think that they could be reliable also for publishing paper results? I really don't know if they are only one of the complete Silva and RDP databses. If you think that they could be fine for publications my work is done and your suggestion was really helpful.

Finally, in case of the code that you send for producing them...its quite difficult and I really would like to learn how to menage. Can you have any advices, for example....which files I need to download from SILVA download to produce an updated versione available for DADA? thank you so much!

madjus98 avatar May 17 '24 12:05 madjus98

Do you think that they could be reliable also for publishing paper results?

Yes, the officially supported references are suitable for publishing paper results.

benjjneb avatar May 17 '24 13:05 benjjneb

Okay thanks!

madjus98 avatar May 17 '24 13:05 madjus98