FindMyFriends
FindMyFriends copied to clipboard
cdhitGrouping: Too low cluster threshold for the word length.
Hello Thomas,
I have installed the GitHub version and have some issues with the cdhitGrouping function. This issue is present when I tried to run the function on my dataset, or when I followed the example: testPG <- .loadPgExample() testPG <- cdhitGrouping(testPG)
Here the error message:
Error in cdhitC(options, name, showProgress): Fatal Error: %s Program halted !!
Too low cluster threshold for the word length. Increase the threshold or the tolerance, or decrease the word length. Traceback:
- cdhitGrouping(testPG)
- cdhitGrouping(testPG)
- .local(object, ...)
- precluster(object, kmerSize[1], maxLengthDif, geneChunkSize, . cdhitOpts)
- lapply(seq_len(nChunks), function(i) { . if (i != 1 && interactive()) . cat("\n") . cdhit(genes(object, subset = seq.int(chunks$start[i], chunks$end[i])), . cdhitOpts, "Preclustering") . })
- lapply(seq_len(nChunks), function(i) { . if (i != 1 && interactive()) . cat("\n") . cdhit(genes(object, subset = seq.int(chunks$start[i], chunks$end[i])), . cdhitOpts, "Preclustering") . })
- FUN(X[[i]], ...)
- cdhit(genes(object, subset = seq.int(chunks$start[i], chunks$end[i])), . cdhitOpts, "Preclustering")
- cdhitC(options, name, showProgress)
I would really appreciated some help on this issue! Thank you very much! Best, Xin
testPG <- .loadPgExample()
testPG <- cdhitGrouping(testPG) Error in cdhitC(options, name, showProgress) : Fatal Error: %s Program halted !!
Too low cluster threshold for the word length. Increase the threshold or the tolerance, or decrease the word length.
I solved specifying the -c parameter:
-c sequence identity threshold, default 0.9
this is the default cd-hit's "global sequence identity" calculated as:
number of identical amino acids in alignment
divided by the full length of the shorter sequence
In R:
pg <- cdhitGrouping(pg, cdhitOpts=list(c=.9))
HTH