dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Large amount of losses in read number during processing: PacBio Hifi reads

Open HitMonk opened this issue 5 months ago • 4 comments

Hello All! Im working with PacBio reads for the very first time. Compared to the tutorial, Im see im loosing really large number of reads. Below is the table:

Sample ccs primers filtered denoised
EC-CD16-16S 39295 29158 8115 2212
EC-CD24-16S 56231 41424 14767 9685
EC-CD8-16S 76440 57378 22369 14828

Additionally, why do i see a drop in read number post primer trimming?

Below are all the commands that I have used:

Primer trimming:

Give path and trim primers

F27 <- "AGRGTTYGATYMTGGCTCAG"
R1492 <- "RGYTACCTTGTTACGACTT"
nops <- file.path(path, "noprimers", basename(fns))
prim <- removePrimers(fns, nops, primer.fwd=F27, primer.rev=dada2:::rc(R1492), orient=TRUE)
prim

Quality filtering and trimming:

filts <- file.path(path, "noprimers", "filtered", basename(fns))
filts <- file.path(path, "filtered_fastptrim", paste0(sample.names, "_filt.fastq.gz"))
out <- filterAndTrim(nops, filts, maxEE=2, rm.phix=TRUE, minQ=3,minLen = 1000,
                     maxN=0, compress=TRUE, multithread=TRUE)
out

STEP 2. Dereplicate the reads

drp_filts <- derepFastq(filts, verbose=TRUE)
names(drp_filts) <- sample.names

STEP 3. Learn Errors

err2 <- learnErrors(drp_filts, errorEstimationFunction=PacBioErrfun, BAND_SIZE=32, multithread=TRUE)

STEP 4. DADA2 main inference

dd2 <- dada(drp_filts, err=err2, BAND_SIZE=32, multithread=TRUE)

Is there anything that im doing wrong? Im open to all suggestions and advice from your side.

Best.

HitMonk avatar Aug 30 '24 12:08 HitMonk