dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

computational requirements for running DADA2

Open madjus98 opened this issue 2 years ago • 1 comments

Dear Dr. Callhan I am interested into computational requirements needs to use DADA2. As many other researchers, I have not the possibilites to use an HPC and I would like to find some reference articles about the minimum computational requirement (CPU and RAM) needs for the application of DADA2 if they exist. Moreover, I would like to fine also articles regarding the actors that influence processing time. If you could help me it would be helpful for optimizing my workflow.

Thank you so much, best regards

madjus98 avatar Apr 10 '24 09:04 madjus98

Computational complexity depends on a variety of factors. As a rule of thumb, "Miseq scale" datasets (~10M reads, ~100k reads per sample) can typically be run on a modern (<6 years old?) laptop, while larger-scale datasets can benefit from moving to a HPC. Other factors that influence computation time are read length (roughly linear increase in computation time with length), sample diversity (higher diversity -> higher compute time), and error rates of the underlying technology (higher error rates -> higher compute time).

It is recommended to have at least 8GB of memory available, and assignTaxonomy in particular can be memory constrained with larger databases like Silva, so if that becomes rate-limiting then finding a machine with more memory is recommended.

benjjneb avatar Apr 11 '24 15:04 benjjneb