fastp icon indicating copy to clipboard operation
fastp copied to clipboard

Duplication rate reported differently in v20.1 compare to v23.1

Open waimunleong opened this issue 3 years ago • 3 comments

Hi developer,

I am using Fastp to trim my dataset. When I used v20.1, the duplication rate was ~6.7%. However, with the same command, parameter and dataset, the duplication rate has increased to ~72%. The rest of the statistics are the same, except for the duplication rate. May I get your opinion on this? Attached is the output for your reference. Thanks.

Command used for both version: fastp -w 16 -q 20 -l 50 -a AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA -i S200034436_L01_103_1.fq.gz --adapter_sequence_r2 AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG -I S200034436_L01_103_2.fq.gz -o trimmed_1.fq.gz -O trimmed_2.fq.gz --correction -R Redcliffe-QC-Report --html Raw_Data_QC_Report.html

v20.1 Fastp_stats_v20 1

v23.1 Fastp_stats_v23 1

waimunleong avatar Apr 18 '22 07:04 waimunleong

Could you try to calculate the dup rate in BAM file, which can be the ground truth.

Update here when you get the BAM dup rate. V0.23.1 should be more accurate than v0.20.1

sfchen avatar Apr 18 '22 07:04 sfchen

thanks a lot @sfchen when data has umi, the json file with or without --umi for assess duplication, which one is more accurate

asmlgkj avatar Apr 27 '22 06:04 asmlgkj

hi @asmlgkj I have encountered the same problem by using v0.20.1 and V0.23.1 The difference is around double, So you found any solution or this one is OK Thanks

ASBioinfo avatar Jun 28 '23 07:06 ASBioinfo