BBTools icon indicating copy to clipboard operation
BBTools copied to clipboard

Deduplication of optical reads with clumpify doesn't work

Open AJL07 opened this issue 4 months ago • 0 comments

Hi,

I wanted to use the clumpify.sh module to estimate and remove optical duplicates from my fastqs but when I ran the command the number of reads in the output is the same as the input (and it do find optical duplicates).

Here's one of the logs :


Version 39.33

Read Estimate:          818713
Memory Estimate:        624 MB
Memory Available:       8358 MB
Set groups to 1
Executing clump.KmerSort1 [in1=25-410_R1_001.fastq.gz, in2=25-410_R2_001.fastq.gz, out1=estimate_optical_duplicates/25-410_R1_001.markdup.fastq.gz, out2=estimate_optical_duplicates/25-410_R2_001.dedup.fastq.gz, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=true, markduplicates=true]

Making comparator.
Made a comparator with k=31, seed=1, border=1, hashes=4
Starting cris 0.
Fetching reads.
Making fetch threads.
Starting threads.
Waiting for threads.
Fetch time: 	4.344 seconds.
Closing input stream.
Combining thread output.
Combine time: 	0.001 seconds.
Sorting.
Sort time: 	0.743 seconds.
Making clumps.
Clump time: 	0.560 seconds.
Deduping.
Dedupe time: 	1.145 seconds.
Writing.
Waiting for writing to complete.
Write time: 	6.892 seconds.
Done!
Time:                         	13.996 seconds.
Reads Processed:          247k 	17.70k reads/sec
Bases Processed:        37158k 	2.65m bases/sec

Reads In:                        247726
Clumps Formed:           50157
Duplicates Found:        14432	5.826%
Reads Out:                     247726
Bases Out:                     37158900
Total time: 	14.604 seconds.

Thanks Anaïs

AJL07 avatar Sep 12 '25 15:09 AJL07