foldseek icon indicating copy to clipboard operation
foldseek copied to clipboard

Clustering problem: the sequences length is less than 14 residues

Open jiaweiguan opened this issue 5 months ago • 0 comments

Hi! foldseek team I am using the clustering function of foldseek. It sometimes works and sometimes it doesn't Here is example and question. The problem I see is that the number of amino acids is too small. In fact, the protein length I uploaded is long enough.

command

foldseek easy-cluster ./dataset/ ./res/ ./tmp/

dataset

dataset.zip

Foldssek Output (for bugs)

Query database size: 2 type: Aminoacid
Estimated memory consumption: 977M
Target database size: 2 type: Aminoacid
Index table k-mer threshold: 154 at k-mer size 6 
Index table: counting k-mers
[=================================================================] 2 0s 0ms
Index table: Masked residues: 0
No k-mer could be extracted for the database ./input_step_redundancy_ss.
Maybe the sequences length is less than 14 residues.
Error: Prefilter step 0 died
Error: Search died

Environment

foldseek Version: 9.427df8a

jiaweiguan avatar Sep 13 '24 06:09 jiaweiguan