cdhit icon indicating copy to clipboard operation
cdhit copied to clipboard

speeding up small word size

Open YiJessePi opened this issue 5 years ago • 0 comments

I'm trying to cluster ~1 million protein sequences with identity of 50%. When I've clustered by 60% identity I've used n=4 and it took few hours with 20 threads. But when reducing word size to 3 it takes very long time (something like 20k per day). I wanted to use n=4 also for 50% percent but it is impossible. Any suggestions how to speed it up? Thanks!

YiJessePi avatar Dec 03 '19 20:12 YiJessePi