dada2
dada2 copied to clipboard
Large amount of losses in read number during processing: PacBio Hifi reads
Hello All! Im working with PacBio reads for the very first time. Compared to the tutorial, Im see im loosing really large number of reads. Below is the table:
Sample | ccs | primers | filtered | denoised |
---|---|---|---|---|
EC-CD16-16S | 39295 | 29158 | 8115 | 2212 |
EC-CD24-16S | 56231 | 41424 | 14767 | 9685 |
EC-CD8-16S | 76440 | 57378 | 22369 | 14828 |
Additionally, why do i see a drop in read number post primer trimming?
Below are all the commands that I have used:
Primer trimming:
Give path and trim primers
F27 <- "AGRGTTYGATYMTGGCTCAG"
R1492 <- "RGYTACCTTGTTACGACTT"
nops <- file.path(path, "noprimers", basename(fns))
prim <- removePrimers(fns, nops, primer.fwd=F27, primer.rev=dada2:::rc(R1492), orient=TRUE)
prim
Quality filtering and trimming:
filts <- file.path(path, "noprimers", "filtered", basename(fns))
filts <- file.path(path, "filtered_fastptrim", paste0(sample.names, "_filt.fastq.gz"))
out <- filterAndTrim(nops, filts, maxEE=2, rm.phix=TRUE, minQ=3,minLen = 1000,
maxN=0, compress=TRUE, multithread=TRUE)
out
STEP 2. Dereplicate the reads
drp_filts <- derepFastq(filts, verbose=TRUE)
names(drp_filts) <- sample.names
STEP 3. Learn Errors
err2 <- learnErrors(drp_filts, errorEstimationFunction=PacBioErrfun, BAND_SIZE=32, multithread=TRUE)
STEP 4. DADA2 main inference
dd2 <- dada(drp_filts, err=err2, BAND_SIZE=32, multithread=TRUE)
Is there anything that im doing wrong? Im open to all suggestions and advice from your side.
Best.