FindMyFriends icon indicating copy to clipboard operation
FindMyFriends copied to clipboard

cdhitGrouping: Too low cluster threshold for the word length.

Open xif033 opened this issue 7 years ago • 2 comments

Hello Thomas,

I have installed the GitHub version and have some issues with the cdhitGrouping function. This issue is present when I tried to run the function on my dataset, or when I followed the example: testPG <- .loadPgExample() testPG <- cdhitGrouping(testPG)

Here the error message:

Error in cdhitC(options, name, showProgress): Fatal Error: %s Program halted !!

Too low cluster threshold for the word length. Increase the threshold or the tolerance, or decrease the word length. Traceback:

  1. cdhitGrouping(testPG)
  2. cdhitGrouping(testPG)
  3. .local(object, ...)
  4. precluster(object, kmerSize[1], maxLengthDif, geneChunkSize, . cdhitOpts)
  5. lapply(seq_len(nChunks), function(i) { . if (i != 1 && interactive()) . cat("\n") . cdhit(genes(object, subset = seq.int(chunks$start[i], chunks$end[i])), . cdhitOpts, "Preclustering") . })
  6. lapply(seq_len(nChunks), function(i) { . if (i != 1 && interactive()) . cat("\n") . cdhit(genes(object, subset = seq.int(chunks$start[i], chunks$end[i])), . cdhitOpts, "Preclustering") . })
  7. FUN(X[[i]], ...)
  8. cdhit(genes(object, subset = seq.int(chunks$start[i], chunks$end[i])), . cdhitOpts, "Preclustering")
  9. cdhitC(options, name, showProgress)

I would really appreciated some help on this issue! Thank you very much! Best, Xin

xif033 avatar Aug 16 '17 19:08 xif033

testPG <- .loadPgExample()

testPG <- cdhitGrouping(testPG) Error in cdhitC(options, name, showProgress) : Fatal Error: %s Program halted !!

Too low cluster threshold for the word length. Increase the threshold or the tolerance, or decrease the word length.

abrozzi avatar Apr 27 '20 14:04 abrozzi

I solved specifying the -c parameter:

-c      sequence identity threshold, default 0.9
 	this is the default cd-hit's "global sequence identity" calculated as:
 	number of identical amino acids in alignment
 	divided by the full length of the shorter sequence

In R:

pg <- cdhitGrouping(pg, cdhitOpts=list(c=.9))

HTH

abrozzi avatar Apr 27 '20 14:04 abrozzi