fastp
fastp copied to clipboard
triming the tail by default?
Hey OpenGene team,
I am new to the fastp. I try to use the fastp for UMI processing.
The Code I used and the stdout:
fastp -i ggal_gut_1.fq -I ggal_gut_2.fq -U --umi_loc=per_read --umi_len= 3 --umi_skip=2 -o ggal_gut_1.fastp_xg.fastq.gz -O ggal_gut_2.fastp_xg.fastq.gz
Read1 before filtering: total reads: 2937 total bases: 308324 Q20 bases: 301739(97.8643%) Q30 bases: 287872(93.3667%)
Read2 before filtering: total reads: 2937 total bases: 308385 Q20 bases: 299658(97.1701%) Q30 bases: 284120(92.1316%)
Read1 after filtering: total reads: 2888 total bases: 263359 Q20 bases: 259764(98.6349%) Q30 bases: 248633(94.4084%)
Read2 after filtering: total reads: 2888 total bases: 263359 Q20 bases: 258650(98.2119%) Q30 bases: 246495(93.5966%)
Filtering result: reads passed filter: 5776 reads failed due to low quality: 88 reads failed due to too many N: 10 reads failed due to too short: 0 reads with adapter trimmed: 2388 bases trimmed due to adapters: 51037
Duplication rate: 0.476677%
Insert size peak (evaluated by paired-end reads): 83
JSON report: fastp.json HTML report: fastp.html
fastp v0.23.0, time used: 1 seconds
The original read1 and read2: @SRR636272.19519409/1 GGCCCGGCAGCAGGATGATGCTCTCCCGGGCCAAGCCGGCTGTGGGGAGCACCCCGCCGCAGGGGGACAGGCGGAGGAAGAAAGGGAAGAAGGTGCCACAGATCG + CCCFFFFDHHD;FF=GGDHGGHIIIGHIIIBDGBFCAHG@E=6?CBDBB;?BB@BD8BB;BDB<>>;@?BB<9>&5<?288AAABDBBBBACBCAC?@AD?CAC?
@SRR636272.19519409/2 GTGGCACCTTCTTCCCTTTCTTCCTCCGCCTGTCCCCCTGCGGCGGGGGGCTCCCCACAGCCGGCTTGGGCCGGGAGAGCATCATCCTGCTGCCGGGCCAGATCG + @@BFFFFFHGHHHJGGGIJJJIIIJEHDGHGFBGDHIJAFGBDHGFDB&555??BBC8?8?B7<;8>>B(8<59B599<(44:A:AC@>(:@:A>BB590<<(9?
The fastp processed Read1 and Read2: @SRR636272.19519409/1:GGC_GTG GGCAGCAGGATGATGCTCTCCCGGGCCAAGCCGGCTGTGGGGAGCACCCCGCCGCAGGGGGACAGGCGGAGGAAGAAAGGGAAGAAGGT + FFDHHD;FF=GGDHGGHIIIGHIIIBDGBFCAHG@E=6?CBDBB;?BB@BD8BB;BDB<>>;@?BB<9>&5<?288AAABDBBBBACBC
@SRR636272.19519409/2:GGC_GTG ACCTTCTTCCCTTTCTTCCTCCGCCTGTCCCCCTGCGGCGGGGGGCTCCCCACAGCCGGCTTGGGCCGGGAGAGCATCATCCTGCTGCC + FFFHGHHHJGGGIJJJIIIJEHDGHGFBGDHIJAFGBDHGFDB&555??BBC8?8?B7<;8>>B(8<59B599<(44:A:AC@>(:@:A
I noticed fastp trim the tails of both read1 and read2. What is the rational of trim them and how to disable or enable the trimming?
without the umi option: seems like the tail trimming is not as aggressive. ie with --umi will trim the tail 11bp of read1, GCCACAGATCG; without --umi will only trim the tail 6bp of read1, AGATCG.
fastp -i test_r1.fq -I test_r2.fq -o test_r1.fastpxu.fq.gz -O test_r2.fastpxu.fq.gz -D
@SRR636272.19519409/1 GGCCCGGCAGCAGGATGATGCTCTCCCGGGCCAAGCCGGCTGTGGGGAGCACCCCGCCGCAGGGGGACAGGCGGAGGAAGAAAGGGAAGAAGGTGCCAC + CCCFFFFDHHD;FF=GGDHGGHIIIGHIIIBDGBFCAHG@E=6?CBDBB;?BB@BD8BB;BDB<>>;@?BB<9>&5<?288AAABDBBBBACBCAC?@A
@SRR636272.19519409/2 GTGGCACCTTCTTCCCTTTCTTCCTCCGCCTGTCCCCCTGCGGCGGGGGGCTCCCCACAGCCGGCTTGGGCCGGGAGAGCATCATCCTGCTGCCGGGCC + @@BFFFFFHGHHHJGGGIJJJIIIJEHDGHGFBGDHIJAFGBDHGFDB&555??BBC8?8?B7<;8>>B(8<59B599<(44:A:AC@>(:@:A>BB59
So I further disable the adapter trimming with -A; this time seems work without trimming the tail. @SRR636272.19519409/1 GGCCCGGCAGCAGGATGATGCTCTCCCGGGCCAAGCCGGCTGTGGGGAGCACCCCGCCGCAGGGGGACAGGCGGAGGAAGAAAGGGAAGAAGGTGCCACAGATCG + CCCFFFFDHHD;FF=GGDHGGHIIIGHIIIBDGBFCAHG@E=6?CBDBB;?BB@BD8BB;BDB<>>;@?BB<9>&5<?288AAABDBBBBACBCAC?@AD?CAC?
@SRR636272.19519409/2 GTGGCACCTTCTTCCCTTTCTTCCTCCGCCTGTCCCCCTGCGGCGGGGGGCTCCCCACAGCCGGCTTGGGCCGGGAGAGCATCATCCTGCTGCCGGGCCAGATCG + @@BFFFFFHGHHHJGGGIJJJIIIJEHDGHGFBGDHIJAFGBDHGFDB&555??BBC8?8?B7<;8>>B(8<59B599<(44:A:AC@>(:@:A>BB590<<(9?
Just out of curiosity: the PE adapter trmming may be the result of overlap analysis, why the -umi option trim more "adaptor sequence" at the tail.