fastp icon indicating copy to clipboard operation
fastp copied to clipboard

triming the tail by default?

Open xmzhuo opened this issue 3 years ago • 1 comments

Hey OpenGene team,

I am new to the fastp. I try to use the fastp for UMI processing.

The Code I used and the stdout:

fastp -i ggal_gut_1.fq -I ggal_gut_2.fq -U --umi_loc=per_read --umi_len= 3 --umi_skip=2 -o ggal_gut_1.fastp_xg.fastq.gz -O ggal_gut_2.fastp_xg.fastq.gz

Read1 before filtering: total reads: 2937 total bases: 308324 Q20 bases: 301739(97.8643%) Q30 bases: 287872(93.3667%)

Read2 before filtering: total reads: 2937 total bases: 308385 Q20 bases: 299658(97.1701%) Q30 bases: 284120(92.1316%)

Read1 after filtering: total reads: 2888 total bases: 263359 Q20 bases: 259764(98.6349%) Q30 bases: 248633(94.4084%)

Read2 after filtering: total reads: 2888 total bases: 263359 Q20 bases: 258650(98.2119%) Q30 bases: 246495(93.5966%)

Filtering result: reads passed filter: 5776 reads failed due to low quality: 88 reads failed due to too many N: 10 reads failed due to too short: 0 reads with adapter trimmed: 2388 bases trimmed due to adapters: 51037

Duplication rate: 0.476677%

Insert size peak (evaluated by paired-end reads): 83

JSON report: fastp.json HTML report: fastp.html

fastp v0.23.0, time used: 1 seconds

The original read1 and read2: @SRR636272.19519409/1 GGCCCGGCAGCAGGATGATGCTCTCCCGGGCCAAGCCGGCTGTGGGGAGCACCCCGCCGCAGGGGGACAGGCGGAGGAAGAAAGGGAAGAAGGTGCCACAGATCG + CCCFFFFDHHD;FF=GGDHGGHIIIGHIIIBDGBFCAHG@E=6?CBDBB;?BB@BD8BB;BDB<>>;@?BB<9>&5<?288AAABDBBBBACBCAC?@AD?CAC?

@SRR636272.19519409/2 GTGGCACCTTCTTCCCTTTCTTCCTCCGCCTGTCCCCCTGCGGCGGGGGGCTCCCCACAGCCGGCTTGGGCCGGGAGAGCATCATCCTGCTGCCGGGCCAGATCG + @@BFFFFFHGHHHJGGGIJJJIIIJEHDGHGFBGDHIJAFGBDHGFDB&555??BBC8?8?B7<;8>>B(8<59B599<(44:A:AC@>(:@:A>BB590<<(9?

The fastp processed Read1 and Read2: @SRR636272.19519409/1:GGC_GTG GGCAGCAGGATGATGCTCTCCCGGGCCAAGCCGGCTGTGGGGAGCACCCCGCCGCAGGGGGACAGGCGGAGGAAGAAAGGGAAGAAGGT + FFDHHD;FF=GGDHGGHIIIGHIIIBDGBFCAHG@E=6?CBDBB;?BB@BD8BB;BDB<>>;@?BB<9>&5<?288AAABDBBBBACBC

@SRR636272.19519409/2:GGC_GTG ACCTTCTTCCCTTTCTTCCTCCGCCTGTCCCCCTGCGGCGGGGGGCTCCCCACAGCCGGCTTGGGCCGGGAGAGCATCATCCTGCTGCC + FFFHGHHHJGGGIJJJIIIJEHDGHGFBGDHIJAFGBDHGFDB&555??BBC8?8?B7<;8>>B(8<59B599<(44:A:AC@>(:@:A

I noticed fastp trim the tails of both read1 and read2. What is the rational of trim them and how to disable or enable the trimming?

xmzhuo avatar Oct 26 '21 16:10 xmzhuo

without the umi option: seems like the tail trimming is not as aggressive. ie with --umi will trim the tail 11bp of read1, GCCACAGATCG; without --umi will only trim the tail 6bp of read1, AGATCG.

fastp -i test_r1.fq -I test_r2.fq -o test_r1.fastpxu.fq.gz -O test_r2.fastpxu.fq.gz -D

@SRR636272.19519409/1 GGCCCGGCAGCAGGATGATGCTCTCCCGGGCCAAGCCGGCTGTGGGGAGCACCCCGCCGCAGGGGGACAGGCGGAGGAAGAAAGGGAAGAAGGTGCCAC + CCCFFFFDHHD;FF=GGDHGGHIIIGHIIIBDGBFCAHG@E=6?CBDBB;?BB@BD8BB;BDB<>>;@?BB<9>&5<?288AAABDBBBBACBCAC?@A

@SRR636272.19519409/2 GTGGCACCTTCTTCCCTTTCTTCCTCCGCCTGTCCCCCTGCGGCGGGGGGCTCCCCACAGCCGGCTTGGGCCGGGAGAGCATCATCCTGCTGCCGGGCC + @@BFFFFFHGHHHJGGGIJJJIIIJEHDGHGFBGDHIJAFGBDHGFDB&555??BBC8?8?B7<;8>>B(8<59B599<(44:A:AC@>(:@:A>BB59

So I further disable the adapter trimming with -A; this time seems work without trimming the tail. @SRR636272.19519409/1 GGCCCGGCAGCAGGATGATGCTCTCCCGGGCCAAGCCGGCTGTGGGGAGCACCCCGCCGCAGGGGGACAGGCGGAGGAAGAAAGGGAAGAAGGTGCCACAGATCG + CCCFFFFDHHD;FF=GGDHGGHIIIGHIIIBDGBFCAHG@E=6?CBDBB;?BB@BD8BB;BDB<>>;@?BB<9>&5<?288AAABDBBBBACBCAC?@AD?CAC?

@SRR636272.19519409/2 GTGGCACCTTCTTCCCTTTCTTCCTCCGCCTGTCCCCCTGCGGCGGGGGGCTCCCCACAGCCGGCTTGGGCCGGGAGAGCATCATCCTGCTGCCGGGCCAGATCG + @@BFFFFFHGHHHJGGGIJJJIIIJEHDGHGFBGDHIJAFGBDHGFDB&555??BBC8?8?B7<;8>>B(8<59B599<(44:A:AC@>(:@:A>BB590<<(9?

Just out of curiosity: the PE adapter trmming may be the result of overlap analysis, why the -umi option trim more "adaptor sequence" at the tail.

xmzhuo avatar Oct 27 '21 19:10 xmzhuo