dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

How to handle Element AVITI data (16S amplicon sequencing)

Open mniku opened this issue 5 months ago • 2 comments

I'm processing our first Element AVITI 16S amplicon sequencing data, using dada2 in QIIME2. I'm wondering how to do this optimally, as it appears that it behaves a bit differently from MiSeq data in dada2:

The phred quality stays much higher than in MiSeq usually, but to my surprise, I still need to truncate the reads just as short to get similar % accepted in DADA2. For more details & actual statistics, see this thread in the QIIME2 forums.

I was first wondering if the AVITI phred scores are a tad optimistic, but got a potentially interesting comment from AVITI bioinformatician. They think the scoring should be quite comparable vs MiSeq BUT: ”There are differences in the distribution of q scores within a read, however, which would be relevant to DADA2 filtering. AVITI data will have greater q score variance within a read--in AVITI data you would be more likely to find a single low q score in the middle of a high quality read.”

Therefore, they recommend we try tweaking maxEE. Does this sound like a good idea? How should we evaluate/validate the results?

We have a huge number of reads to begin with so that’s no problem, and by truncating close to minimum required overlap we get comparable % of reads through vs. MiSeq. But it feels stupid to throw away high quality data just because I don’t completely understand what’s going on.

The QIIME2 guys recommended me to open an issue here, because this calls for a deep level understanding of dada2.

mniku avatar Feb 23 '24 18:02 mniku