cdhit
cdhit copied to clipboard
Running cd-hit with 26Gb data set is too slow
My nucleic acid data set is about 26Gb with 29,938,643 sequences. The software has been running for a month with the parameter of '-T 85 -M 100000 -c 0.9', but it does not seem to start clustering. Could you give me some advice about how to run cd-hit with my big data set, or other software that can replace cd-hit?
Hi, I'm having a similar issue. Did you find a way to speed-up the cd-hit or maybe found another software that does the same? Thanks!!
Same issue here. Increasing the number of CPUs seems didn't increase the speed at all. It only processed ~10k sequences (0.1% of my data) with 30 CPUs running for a week. Please let me know how to work with the huge data. Thanks
I have the same question with you, I use cd-hit-est with fasta file input. maybe the fasta file is slower than encoded database file? have you solve this question? could you give me some advice, thanks
Hi, I'm having a similar issue. Did you find a way to speed-up the cd-hit or maybe found another software that does the same? Thanks!!
hello,to be honest, i chose to use mmseqs in the end. this may be faster in the big datasets. good luck.