fastp
fastp copied to clipboard
A serious adapter trim bug for targeted sequencing data? use cutadapt instead
In the following example, I want to remove polyA, the expected last several bases should be CTTTT. But using the latest (v0.23.2) fastp -i test.fastq --adapter_sequence AAAAAAAAAA --stdout , I will only have CTTT, with the last T missing. I think this is because TAAAAAAAAA matched to the given adapter sequence with only 1 mismatch. However, this won't be an issue for cutadapt, even though cutadapt also allow 10% error. cutadapt -a AAAAAAAAAA test.fastq > out.fastq
more test.fastq
@M04990:162:000000000-GCHR8:1:1101:15420:1384 1:N:0:ATTCAGAA+ATAGAGGC
GGCCACCTACCTAAGAACCATCCGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTAAAAAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAGATGGGGAGGGCACACGTCTGAACTCCGGCACATTTCAAAAATTTTTTTTTTG
TTTTTTTTTTTTTTTTTTATTTAATTTTTTTTTTTTTTTTTTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTTGTTGTGTTGTGTTGTTTGTTGTTGTG
+
ABABCCFFFFFFGGGGGGGGGGHGGHHGHGHHHHHGHHHGHHHHHHHHHHHHHHHGHHHHHHHGGHHGHHHHHHHHHHHHHGHHHGHHGGCGGHGGGGGHHHHHHHHHGGGGGGGGGGGGGC.CHHHHGGGGGGGGG?..;0;A..9--.../.-;./.00000B.----/0:;000;0..0;9B=-:---
/;:FA:CCD;C::-:---;00000;00;B-;;BAFF@=@FCF-;::FEAC;;;;@9:@@-;9;@:@CF-;:BCDCDFF@;;B/;.;9.;.;;.;;/9;;:;/;B./.9;.
-------------------------------------------------------------------------------------------
fastp -i test.fastq --adapter_sequence=AAAAAAAAAA --stdout -Q -q 0 -G -u 0 -t 0
Streaming uncompressed reads to STDOUT...
@M04990:162:000000000-GCHR8:1:1101:15420:1384 1:N:0:ATTCAGAA+ATAGAGGC
GGCCACCTACCTAAGAACCATCCGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTT
+
ABABCCFFFFFFGGGGGGGGGGHGGHHGHGHHHHHGHHHGHHHHHHHHHHHHHHHGHHHHHHHGGHHGHHHHHHHHHHHHHGHHHGHHGGCGGHGGGGGHHH
Read1 before filtering:
total reads: 1
total bases: 301
Q20 bases: 234(77.7409%)
Q30 bases: 177(58.804%)
Read1 after filtering:
total reads: 1
total bases: 102
Q20 bases: 102(100%)
Q30 bases: 102(100%)
Filtering result:
reads passed filter: 1
reads failed due to low quality: 0
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 1
bases trimmed due to adapters: 199
Duplication rate (may be overestimated since this is SE data): 0%
JSON report: fastp.json
HTML report: fastp.html
fastp -i test.fastq --adapter_sequence=AAAAAAAAAA --stdout -Q -q 0 -G -u 0 -t 0
fastp v0.23.2, time used: 0 seconds
I added -Q -q 0 -G -u 0 -t 0 incase fastp remove bases because of other filters.