BBTools
BBTools copied to clipboard
Deduplication of optical reads with clumpify doesn't work
Hi,
I wanted to use the clumpify.sh module to estimate and remove optical duplicates from my fastqs but when I ran the command the number of reads in the output is the same as the input (and it do find optical duplicates).
Here's one of the logs :
Version 39.33
Read Estimate: 818713
Memory Estimate: 624 MB
Memory Available: 8358 MB
Set groups to 1
Executing clump.KmerSort1 [in1=25-410_R1_001.fastq.gz, in2=25-410_R2_001.fastq.gz, out1=estimate_optical_duplicates/25-410_R1_001.markdup.fastq.gz, out2=estimate_optical_duplicates/25-410_R2_001.dedup.fastq.gz, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=true, markduplicates=true]
Making comparator.
Made a comparator with k=31, seed=1, border=1, hashes=4
Starting cris 0.
Fetching reads.
Making fetch threads.
Starting threads.
Waiting for threads.
Fetch time: 4.344 seconds.
Closing input stream.
Combining thread output.
Combine time: 0.001 seconds.
Sorting.
Sort time: 0.743 seconds.
Making clumps.
Clump time: 0.560 seconds.
Deduping.
Dedupe time: 1.145 seconds.
Writing.
Waiting for writing to complete.
Write time: 6.892 seconds.
Done!
Time: 13.996 seconds.
Reads Processed: 247k 17.70k reads/sec
Bases Processed: 37158k 2.65m bases/sec
Reads In: 247726
Clumps Formed: 50157
Duplicates Found: 14432 5.826%
Reads Out: 247726
Bases Out: 37158900
Total time: 14.604 seconds.
Thanks Anaïs