foldseek icon indicating copy to clipboard operation
foldseek copied to clipboard

database cluster by pure structure similarity

Open Wangchentong opened this issue 1 year ago • 2 comments

@milot-mirdita @martin-steinegger

Hi, i would like to ask a technical detail question:

  i want to cluster a databse purely by structure similarity for my intention in another issue.

  In foldseek search, i observe there is a parameter misc: --alignment-type can control use aa,3di,aa+3di for alignment. But there is no this option in foldseek cluster command, i observe following option mitght relate to my purpose:

foldseek cluster -h
    prefilter:
	--seed-sub-mat TWIN              Substitution matrix file for k-mer generation [aa:3di.out,nucl:3di.out]
	--mask INT                       Mask sequences in k-mer stage: 0: w/o low complexity masking, 1: with low complexity masking [0]
        --mask-prob FLOAT                Mask sequences is probablity is above threshold [0.900]
    align:
	--alignment-mode INT             How to compute the alignment:
	                                  0: automatic
	                                  1: only score and end_pos
	                                  2: also start_pos and cov
	                                  3: also seq.id [3]
    clust:
	--similarity-type INT            Type of score used for clustering. 1: alignment score 2: sequence identity [2]
    common:
	--sub-mat TWIN                   Substitution matrix file [aa:3di.out,nucl:3di.out]

Here is my current command

foldseek cluster afDB af80_clusterDB tmp -c 0.8 --cluster-reassign --mask 1 --alignment-mode 2 --similarity-type 1

Thanks to you guys for this amazing tool! Hope i can get opportunity to know this parameter well since i look up document and there's little description for these parameters. Any suggestion is appreciated. a lot !😉

Wangchentong avatar Jun 28 '23 09:06 Wangchentong

--alignment-type should work in the clustering. It also shows up in my help text. What version are you using. I recommend using the most recent version since I properly implemented the 3Di only search in the most recent commit.

--similarity-type 1 has no impact on the clustering and --cluster-reassign is currently not implemented.

martin-steinegger avatar Jul 09 '23 09:07 martin-steinegger

I just dealt with this identical issue. I found foldseek behaved as you describe if I installed it with conda (conda install -c conda-forge -c bioconda foldseek). However, both of the precompiled binaries for Linux show the --alignment-type command with easy-cluster for me. (Note, there is https://mmseqs.com/foldseek/foldseek-linux-sse2.tar.gz instead of https://mmseqs.com/foldseek/foldseek-linux-sse41.tar.gz as the readme says.)

dtischer avatar Jul 20 '23 01:07 dtischer