fastp
fastp copied to clipboard
Duplication rate reported differently in v20.1 compare to v23.1
Hi developer,
I am using Fastp to trim my dataset. When I used v20.1, the duplication rate was ~6.7%. However, with the same command, parameter and dataset, the duplication rate has increased to ~72%. The rest of the statistics are the same, except for the duplication rate. May I get your opinion on this? Attached is the output for your reference. Thanks.
Command used for both version: fastp -w 16 -q 20 -l 50 -a AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA -i S200034436_L01_103_1.fq.gz --adapter_sequence_r2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG -I S200034436_L01_103_2.fq.gz -o trimmed_1.fq.gz -O trimmed_2.fq.gz --correction -R Redcliffe-QC-Report --html Raw_Data_QC_Report.html
v20.1

v23.1

Could you try to calculate the dup rate in BAM file, which can be the ground truth.
Update here when you get the BAM dup rate. V0.23.1 should be more accurate than v0.20.1
thanks a lot @sfchen when data has umi, the json file with or without --umi for assess duplication, which one is more accurate
hi @asmlgkj I have encountered the same problem by using v0.20.1 and V0.23.1 The difference is around double, So you found any solution or this one is OK Thanks