dada2
dada2 copied to clipboard
Can DADA2 pipeline process MinION ITS1 and ITS2 single read fastq files?
Is it possible to use DADA2 to process ITS1 or ITS2 single read fastq files generated by ONT. The reads are on average 750 in length. Thanks for your input.
No, the error rate on standard ONT data is too high for the DADA2 approach to be effective.
Do you think there are some (relatively) simple algorithmic tweaks that might make it possible to denoise these longer, more error-prone reads in the future? For instance, what if the center of each partition were the consensus sequence of all of its reads rather than the most abundant exact sequence? When splitting a partition, the single read with the smallest p-value could still serve as an alternate nucleus for assigning reads to the new partitions, but then after the partitions are formed, the consensus could be updated for each before the next iteration or final output. I can foresee a couple extra problems with this approach, but they might be tolerable...
I think this is going to be a recurring question. ONT chemistries continue to improve rapidly, but I know the current algorithmic constraint against long reads is not just Q-score, but the length itself. I would sure love to do full ribosomal operon or cistron amplicon sequencing with the new tech!
@rmcminds For your first paragraph: No, the DADA2 approach is not possible for current ONT data. What you describe may be possible, but that is sufficiently different from the assumptions made by DADA2 (in particular, a non-negligible number of error-free sequences) that it means a new method.
I think this is going to be a recurring question. ONT chemistries continue to improve rapidly, but I know the current algorithmic constraint against long reads is not just Q-score, but the length itself. I would sure love to do full ribosomal operon or cistron amplicon sequencing with the new tech!
Active area of research. ONT Duplex is very interesting to us. In fact, if anyone has a line on someone at ONT that would be interested in putting together Duplex and DADA2, please reach out.
It seems the duplex pipeline doesn't really work for amplicons because it's hard to know whether paired reads actually came from the same original molecule. It had me thinking the denoising step normally done by dada2 is actually something that should happen during basecalling for ONT - sorting reads into partitions based on their raw signals (or at least more raw; eg full 4-base probabilities) and determining the sequence for all the reads in each partition at once. I created an issue (https://github.com/nanoporetech/dorado/issues/536) on their Dorado basecaller repo inquiring about plans or possibilities and haven't heard back yet.
Thanks for replying - I agree now that dada2 isn't ideal for dealing with ONT data. But much of your logic could be useful in whatever does get developed!