rnaseq
rnaseq copied to clipboard
FASTP trimming Smart3-seq data removes all reads (UMI discard read2)
Description of the bug
I'm trying to process bulk Smart3-seq data using the pipeline (related Slack discussion here). In my case, the FASTQ read structure is the following:
R1: 6N UMI - GGG - transcript [- polyA - adaptors]
R2: 6N UMI - T
I specify the parameters of UMITools to construct the UMI from R1+R2 and then discard R2 (see below).
However, the trimmer (FASTP) afterwards reports that all reads are low quality or too short.
-[nf-core/rnaseq] Pipeline completed successfully with skipped sampl(es)- -[nf-core/rnaseq] Please check MultiQC report: 18/18 samples skipped since they failed 10000 trimmed read threshold.-
I believe that this is because FASTP is called with both R1 and R2, instead of discarding R2 (see full log file below), which produces empty .fastp.fastq.gz
files:
# all reads removed
fastp --in1 FF230228_13_1.fastq.gz --in2 FF230228_13_2.fastq.gz
--out1 FF230228_13_1.fastp.fastq.gz --out2 FF230228_13_2.fastp.fastq.gz
The reason for this is that if I manually run FASTP on R1 only, it will preserve a non-zero number of reads:
# retains most reads
fastp --in1 FF230228_13_1.fastq.gz --out1 FF230228_13_1.fastp.fastq.gz
A similar issue was fixed by exposing the --umi_discard_read
parameter, but I guess FASTP trimming was not included: https://github.com/nf-core/rnaseq/issues/750.
Workaround: Not using FASTP but TrimGalore (the default) also processes the samples correctly (and outputs only one FASTQ per sample after trimming).
Command used and terminal output
nextflow run nf-core/rnaseq -r 3.11.0
--input samples.csv
--with_umi
--umitools_extract_method regex
--umitools_bc_pattern '(?P<umi_1>.{6})(?P<discard_1>GGG).*'
--umitools_bc_pattern2 '(?P<umi_2>.{6})(?P<discard_2>T).*'
--umi_discard_read 2
--umitools_dedup_stats true
--trimmer fastp # defaulting to trimgalore works as expected
Relevant files
FF230228_13.fastp.log nextflow.log
System information
Nextflow = 22.10.1 Ubuntu Linux = 20.04.6 LTS nf-core/rnaseq = 3.11.0 local executor