dada2 icon indicating copy to clipboard operation
dada2 copied to clipboard

Removing reads with zero length when cutting off the primer sequences using cutadapt in DADA2

Open Ecotone23 opened this issue 1 year ago • 2 comments

Hi,

I am running some scripts modified from DADA2 ITS tutorial(1.8).

I used the following R scripts to remove the primers:

cutadapt <- "/opt/miniconda3/envs/cutadaptenv/bin/cutadapt" system2(cutadapt, args = "--version")

path.cut <- file.path(path, "cutadapt") if(!dir.exists(path.cut)) dir.create(path.cut) fnFs.cut <- file.path(path.cut, basename(fnFs)) fnRs.cut <- file.path(path.cut, basename(fnRs)) FWD.RC <- dada2:::rc(FWD) REV.RC <- dada2:::rc(REV)

Trim FWD and the reverse-complement of REV off of R1 (forward reads)

R1.flags <- paste("-g", FWD, "-a", REV.RC)

Trim REV and the reverse-complement of FWD off of R2 (reverse reads)

R2.flags <- paste("-G", REV, "-A", FWD.RC)

Run Cutadapt

for(i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2, # -n 2 required to remove FWD and REV from reads "-o", fnFs.cut[i], "-p", fnRs.cut[i], fnFs.filtN[i], #output files fnRs.filtN[i]))} # input files } input files

The primers were successfully removed. Then I had names for primer-free fastq files with the following R scripts:

Forward and reverse fastq filenames have the format:

cutFs <- sort(list.files(path.cut, pattern = "_L001_R1_001.fastq", full.names = TRUE)) cutRs <- sort(list.files(path.cut, pattern = "_L001_R2_001.fastq", full.names = TRUE))

Next step, I tried to inspect the quality of the files. However, these files cannot be plotted by plotQualityProfile(cutFs[1:2]). The error is shown as: Error: BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error in density.default(qscore): 'x' contains missing values

I ran readFastq(cutFs[1:2]) and found there are many zero-length reads:

class: ShortReadQ length: 50486 reads; width: 0..250 cycles

So I wonder how I can remove these zero-reads from each sample? I know that in cutadapt function running in terminal, we many use the parameters "-m, --minimum-length" to specify the minimum length of each sample. But I don't know how to do this in R scripts, especially how to incorporate this terminal command into the below R scripts when removing the primers:

for(i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2, # -n 2 required to remove FWD and REV from reads "-o", fnFs.cut[i], "-p", fnRs.cut[i], fnFs.filtN[i], #output files fnRs.filtN[i]))} # input files } input files Thank you.

Gabriel

Ecotone23 avatar Aug 04 '22 20:08 Ecotone23

In the system2 call, you can add more flags and associated arguments using the same syntax you see there already.

So add, system2(..., "-m", 20) to add -m 20 to the command line argument called by system2.

benjjneb avatar Aug 05 '22 13:08 benjjneb

It works! The zero-length reads were removed and the error did not show up. Thank you so much Benjamin!

Ecotone23 avatar Aug 05 '22 17:08 Ecotone23