foldseek
foldseek copied to clipboard
database cluster by pure structure similarity
@milot-mirdita @martin-steinegger
Hi, i would like to ask a technical detail question:
i want to cluster a databse purely by structure similarity for my intention in another issue.
In foldseek search, i observe there is a parameter misc: --alignment-type can control use aa,3di,aa+3di for alignment. But there is no this option in foldseek cluster command, i observe following option mitght relate to my purpose:
foldseek cluster -h
prefilter:
--seed-sub-mat TWIN Substitution matrix file for k-mer generation [aa:3di.out,nucl:3di.out]
--mask INT Mask sequences in k-mer stage: 0: w/o low complexity masking, 1: with low complexity masking [0]
--mask-prob FLOAT Mask sequences is probablity is above threshold [0.900]
align:
--alignment-mode INT How to compute the alignment:
0: automatic
1: only score and end_pos
2: also start_pos and cov
3: also seq.id [3]
clust:
--similarity-type INT Type of score used for clustering. 1: alignment score 2: sequence identity [2]
common:
--sub-mat TWIN Substitution matrix file [aa:3di.out,nucl:3di.out]
Here is my current command
foldseek cluster afDB af80_clusterDB tmp -c 0.8 --cluster-reassign --mask 1 --alignment-mode 2 --similarity-type 1
Thanks to you guys for this amazing tool! Hope i can get opportunity to know this parameter well since i look up document and there's little description for these parameters. Any suggestion is appreciated. a lot !😉
--alignment-type
should work in the clustering. It also shows up in my help text. What version are you using. I recommend using the most recent version since I properly implemented the 3Di only search in the most recent commit.
--similarity-type 1
has no impact on the clustering and --cluster-reassign
is currently not implemented.
I just dealt with this identical issue. I found foldseek behaved as you describe if I installed it with conda (conda install -c conda-forge -c bioconda foldseek
). However, both of the precompiled binaries for Linux show the --alignment-type
command with easy-cluster
for me. (Note, there is https://mmseqs.com/foldseek/foldseek-linux-sse2.tar.gz instead of https://mmseqs.com/foldseek/foldseek-linux-sse41.tar.gz as the readme says.)