Removing reads with zero length when cutting off the primer sequences using cutadapt in DADA2

Open Ecotone23 opened this issue 1 year ago • 2 comments

Hi,

I am running some scripts modified from DADA2 ITS tutorial(1.8).

I used the following R scripts to remove the primers:

cutadapt <- "/opt/miniconda3/envs/cutadaptenv/bin/cutadapt" system2(cutadapt, args = "--version")

path.cut <- file.path(path, "cutadapt") if(!dir.exists(path.cut)) dir.create(path.cut) fnFs.cut <- file.path(path.cut, basename(fnFs)) fnRs.cut <- file.path(path.cut, basename(fnRs)) FWD.RC <- dada2:::rc(FWD) REV.RC <- dada2:::rc(REV)

Trim FWD and the reverse-complement of REV off of R1 (forward reads)

R1.flags <- paste("-g", FWD, "-a", REV.RC)

Trim REV and the reverse-complement of FWD off of R2 (reverse reads)

R2.flags <- paste("-G", REV, "-A", FWD.RC)

Run Cutadapt

for(i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2, # -n 2 required to remove FWD and REV from reads "-o", fnFs.cut[i], "-p", fnRs.cut[i], fnFs.filtN[i], #output files fnRs.filtN[i]))} # input files } input files

The primers were successfully removed. Then I had names for primer-free fastq files with the following R scripts:

Forward and reverse fastq filenames have the format:

cutFs <- sort(list.files(path.cut, pattern = "_L001_R1_001.fastq", full.names = TRUE)) cutRs <- sort(list.files(path.cut, pattern = "_L001_R2_001.fastq", full.names = TRUE))

Next step, I tried to inspect the quality of the files. However, these files cannot be plotted by plotQualityProfile(cutFs[1:2]). The error is shown as: Error: BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error in density.default(qscore): 'x' contains missing values

I ran readFastq(cutFs[1:2]) and found there are many zero-length reads:

class: ShortReadQ length: 50486 reads; width: 0..250 cycles

So I wonder how I can remove these zero-reads from each sample? I know that in cutadapt function running in terminal, we many use the parameters "-m, --minimum-length" to specify the minimum length of each sample. But I don't know how to do this in R scripts, especially how to incorporate this terminal command into the below R scripts when removing the primers:

Gabriel

Aug 04 '22 20:08 Ecotone23

In the system2 call, you can add more flags and associated arguments using the same syntax you see there already.

So add, system2(..., "-m", 20) to add -m 20 to the command line argument called by system2.

Aug 05 '22 13:08 benjjneb

It works! The zero-length reads were removed and the error did not show up. Thank you so much Benjamin!

Aug 05 '22 17:08 Ecotone23

dada2 dada2 copied to clipboard

Removing reads with zero length when cutting off the primer sequences using cutadapt in DADA2

Trim FWD and the reverse-complement of REV off of R1 (forward reads)

Trim REV and the reverse-complement of FWD off of R2 (reverse reads)

Run Cutadapt

Forward and reverse fastq filenames have the format:

dada2
dada2 copied to clipboard