dada2
dada2 copied to clipboard
Removing reads with zero length when cutting off the primer sequences using cutadapt in DADA2
Hi,
I am running some scripts modified from DADA2 ITS tutorial(1.8).
I used the following R scripts to remove the primers:
cutadapt <- "/opt/miniconda3/envs/cutadaptenv/bin/cutadapt" system2(cutadapt, args = "--version")
path.cut <- file.path(path, "cutadapt") if(!dir.exists(path.cut)) dir.create(path.cut) fnFs.cut <- file.path(path.cut, basename(fnFs)) fnRs.cut <- file.path(path.cut, basename(fnRs)) FWD.RC <- dada2:::rc(FWD) REV.RC <- dada2:::rc(REV)
Trim FWD and the reverse-complement of REV off of R1 (forward reads)
R1.flags <- paste("-g", FWD, "-a", REV.RC)
Trim REV and the reverse-complement of FWD off of R2 (reverse reads)
R2.flags <- paste("-G", REV, "-A", FWD.RC)
Run Cutadapt
for(i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2, # -n 2 required to remove FWD and REV from reads "-o", fnFs.cut[i], "-p", fnRs.cut[i], fnFs.filtN[i], #output files fnRs.filtN[i]))} # input files } input files
The primers were successfully removed. Then I had names for primer-free fastq files with the following R scripts:
Forward and reverse fastq filenames have the format:
cutFs <- sort(list.files(path.cut, pattern = "_L001_R1_001.fastq", full.names = TRUE)) cutRs <- sort(list.files(path.cut, pattern = "_L001_R2_001.fastq", full.names = TRUE))
Next step, I tried to inspect the quality of the files. However, these files cannot be plotted by plotQualityProfile(cutFs[1:2]). The error is shown as: Error: BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error in density.default(qscore): 'x' contains missing values
I ran readFastq(cutFs[1:2]) and found there are many zero-length reads:
class: ShortReadQ length: 50486 reads; width: 0..250 cycles
So I wonder how I can remove these zero-reads from each sample? I know that in cutadapt function running in terminal, we many use the parameters "-m, --minimum-length" to specify the minimum length of each sample. But I don't know how to do this in R scripts, especially how to incorporate this terminal command into the below R scripts when removing the primers:
for(i in seq_along(fnFs)) { system2(cutadapt, args = c(R1.flags, R2.flags, "-n", 2, # -n 2 required to remove FWD and REV from reads "-o", fnFs.cut[i], "-p", fnRs.cut[i], fnFs.filtN[i], #output files fnRs.filtN[i]))} # input files } input files Thank you.
Gabriel
In the system2
call, you can add more flags and associated arguments using the same syntax you see there already.
So add, system2(..., "-m", 20)
to add -m 20
to the command line argument called by system2
.
It works! The zero-length reads were removed and the error did not show up. Thank you so much Benjamin!