rnaseq icon indicating copy to clipboard operation
rnaseq copied to clipboard

FASTP trimming Smart3-seq data removes all reads (UMI discard read2)

Open mschubert opened this issue 5 months ago • 0 comments

Description of the bug

I'm trying to process bulk Smart3-seq data using the pipeline (related Slack discussion here). In my case, the FASTQ read structure is the following:

R1: 6N UMI - GGG - transcript [- polyA - adaptors]
R2: 6N UMI - T

I specify the parameters of UMITools to construct the UMI from R1+R2 and then discard R2 (see below).

However, the trimmer (FASTP) afterwards reports that all reads are low quality or too short.

-[nf-core/rnaseq] Pipeline completed successfully with skipped sampl(es)- -[nf-core/rnaseq] Please check MultiQC report: 18/18 samples skipped since they failed 10000 trimmed read threshold.-

I believe that this is because FASTP is called with both R1 and R2, instead of discarding R2 (see full log file below), which produces empty .fastp.fastq.gz files:

# all reads removed
fastp --in1 FF230228_13_1.fastq.gz --in2 FF230228_13_2.fastq.gz
    --out1 FF230228_13_1.fastp.fastq.gz --out2 FF230228_13_2.fastp.fastq.gz

The reason for this is that if I manually run FASTP on R1 only, it will preserve a non-zero number of reads:

# retains most reads
fastp --in1 FF230228_13_1.fastq.gz --out1 FF230228_13_1.fastp.fastq.gz

A similar issue was fixed by exposing the --umi_discard_read parameter, but I guess FASTP trimming was not included: https://github.com/nf-core/rnaseq/issues/750.

Workaround: Not using FASTP but TrimGalore (the default) also processes the samples correctly (and outputs only one FASTQ per sample after trimming).

Command used and terminal output

nextflow run nf-core/rnaseq -r 3.11.0
    --input samples.csv
    --with_umi
    --umitools_extract_method regex
    --umitools_bc_pattern '(?P<umi_1>.{6})(?P<discard_1>GGG).*'
    --umitools_bc_pattern2 '(?P<umi_2>.{6})(?P<discard_2>T).*'
    --umi_discard_read 2
    --umitools_dedup_stats true
    --trimmer fastp # defaulting to trimgalore works as expected

Relevant files

FF230228_13.fastp.log nextflow.log

System information

Nextflow = 22.10.1 Ubuntu Linux = 20.04.6 LTS nf-core/rnaseq = 3.11.0 local executor

mschubert avatar Jan 23 '24 15:01 mschubert