dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Primers, cutadapt output and chimera

Open luigallucci opened this issue 6 days ago • 1 comments

Hi @benjjneb,

I have several questions, hopefully not all stupid ones.

I'm working with sequences with both a classical set of primers...

FWD <- "CCTACGGGNGGCWGCAG"

REV <- "GACTACHVGGGTATCTAATCC"

Which I'm removing using a slightly modified version of the tutorial of ITS, so through Cutadapt. In this case, the samples are environmental ones. We suppose that inside this, there should be organisms rich in introns in their 16S rRNA gene (we are sequencing the V3-V4 region). After a pipeline using the pooling for the dada algorithm and the following values for the chimera removal...

derepF <- derepFastq(filtFs)
dadaFs <- dada(derepF, err = errF, pool = TRUE, multithread = 30, verbose = TRUE)
derepR <- derepFastq(filtRs)
dadaRs <- dada(derepR, err = errR, pool = TRUE, multithread = 30, verbose = TRUE)
mergers1 <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, minOverlap = 12, verbose = TRUE)

seqtab <- makeSequenceTable(mergers1)
seqtab.nochim <- removeBimeraDenovo(seqtab, method="pooled", multithread= 20, verbose=TRUE)

...the output highlight an high removal rate of ASVs (still preserving most of the reads). 29890 were the starting ASVs before chimera removal. 9822 were passing the chimera step, with a final percentage of 82% (sum(seqtab.nochim)/sum(seqtab)).

Using ITS pipe, I'm pretty sure all the primers were removed. Do you have any ideas or suggestions?

Another question is, is there a "suggested" way to deal with a set of primers like this?

341F (5'-CCTACGGGNGGCWGCAG-3' 
341Fb (5'-TCCTACGGGNGGCWGCAG-3'
341Fc (5'-ATCCTACGGGNGGCWGCAG-3'
341Fd (5'-TGTCCTACGGGNGGCWGCAG-3'
785R (5'-GACTACHVGGGTATCTAATCC-3'

These were used all together in order to deal better with region variability.

luigallucci avatar Jun 30 '24 21:06 luigallucci