fastp Incorrect adapter detected?

Sometimes fastp detects what could be real genomic repetitive sequence as adaptor, for example:

Detected read1 adapter: | AACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACC

Has anyone else seen this behaviour?

My command was:

fastp -w 16 --dont_overwrite -Q -z 1 --in1 SB12_R1.fq.gz --in2 SB12_R2.fq.gz --out1 ../01_adapters_removed/tmp_fastp/SB12_R1.fq.gz --out2 ../01_adapters_removed/tmp_fastp/SB12_R2.fq.gz --detect_adapter_for_pe -l 21 --json ../01_adapters_removed/SB12.fastp.adapters.json --html ../01_adapters_removed/SB12.fastp.adapters.html

Searching the literature, AAACCCT seems to be a common motif in telomeric repeats in plants: https://www.nature.com/articles/nature15714/tables/1

How can I avoid this false detection?

Edgardo

May 31 '19 13:05 edgardomortiz

Thanks for your info.

Which version did you use? Can you upload a file with the first 100K reads here, so I can reproduce the problem?

You can remove --detect_adapter_for_pe for WAR. Most adapters will be still trimmed by overlap detection.

May 31 '19 14:05 sfchen

I am using v. 0.20.0. The telomeric sequence is not detected as adadpter when I remove --detect_adapter_for_pe from the command. I just included that option because I read in your manual it makes it more sensitive. I also attach the first 100k reads, thanks for the help 100k_R1.fq.gz 100k_R2.fq.gz

May 31 '19 14:05 edgardomortiz

Thanks, I will find a chance to reproduce this issue.

Jun 05 '19 06:06 sfchen

Hi, @sfchen

I have the same issue with @edgardomortiz . Did it have any progress or solution ?

Here is the command I use

fastp -w 6 -6 -i s1.R1.fastq.gz -I s1.R2.fastq.gz --detect_adapter_for_pe --length_required 45 -o s1.Clean.R1.fastq.gz -O s1.Clean.R2.fastq.gz --json s1.fastp.json --html s1.fastp.html

fastp v0.20.0, time used: 397 seconds
Detecting adapter sequence for read1...
CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAA

Detecting adapter sequence for read2...
CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAA

Aug 09 '19 03:08 baozg

Hi, @sfchen

I also have the same issue with @edgardomortiz and @baozg .

Here's the command that I use:

fastp --detect_adapter_for_pe \
--unqualified_percent_limit 50 \
--cut_right --cut_right_window_size 4 --cut_right_mean_quality 20 \
--correction \
--in1 SRR10260015_1.fastq.gz \
--in2 SRR10260015_2.fastq.gz \
--out1 SRR10260015_1_trimmed.fastq.gz \
--out2 SRR10260015_2_trimmed.fastq.gz \
--unpaired1 SRR10260015_1_passed.fastq.gz \
--unpaired2 SRR10260015_2_passed.fastq.gz \
--failed_out fail_out.fastq.gz \
--thread 4 \
2> fastp_log

and here's the stderr on adapter detection:

Detecting adapter sequence for read1...
CTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGTCTGT

Detecting adapter sequence for read2...
CAGACAGACAGACAGACAGACAGACAGACAGACAGACAGACAGACAGACAGACAGACAGA

Just as mentioned above, to remove the --detect_adapter_for_pe will still allow adapter-contaminated reads trimmed based on overlap analysis, but avoid the adapter mis-detection for repeated sequence.

Oct 26 '20 15:10 Sfeng666

I also had the same issue, in 4 out of 32 of my datasets (bird WGS), fastp detects the telomeric repeat as an adapter (CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC) when I use --detect_adapter_for_pe, in version 0.23.2.

Mar 04 '22 17:03 weirlab

fastp fastp copied to clipboard

Incorrect adapter detected?

fastp
fastp copied to clipboard