MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

process stuck at rescorediagonal when do cluster

Open Wangchentong opened this issue 2 years ago • 7 comments

Expected Behavior

i am doing a cluster process on 15 million seq, but i am stuck on rescorediagonal, watch with top command, i find it's not memory's issue since only 8G Ram is used, i use 128 thread and 400G Ram, i believe it's enough for this cluster.

The strange thing is -- when i decrease seq num to 7.5 million, it works just fine, i am confused what makes it stuck.

Current Behavior

mmseqs cluster tmpDB DB_clu tmp --min-seq-id 0.3 --threads 128 cluster tmpDB DB_clu tmp --min-seq-id 0.3 --threads 128

MMseqs Version: 67949d702dbfc6e5d54fdd0f14a9ab6740f11c32 Substitution matrix aa:blosum62.out,nucl:nucleotide.out Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out Sensitivity 4 k-mer length 0 k-score seq:2147483647,prof:2147483647 Alphabet size aa:21,nucl:5 Max sequence length 65535 Max results per query 20 Split database 0 Split mode 2 Split memory limit 0 Coverage threshold 0.8 Coverage mode 0 Compositional bias 1 Compositional bias 1 Diagonal scoring true Exact k-mer matching 0 Mask residues 1 Mask residues probability 0.9 Mask lower case residues 0 Minimum diagonal score 15 Selected taxa Include identical seq. id. false Spaced k-mers 1 Preload mode 0 Pseudo count a substitution:1.100,context:1.400 Pseudo count b substitution:4.100,context:5.800 Spaced k-mer pattern Local temporary path Threads 128 Compressed 0 Verbosity 3 Add backtrace false Alignment mode 3 Alignment mode 0 Allow wrapped scoring false E-value threshold 0.001 Seq. id. threshold 0.3 Min alignment length 0 Seq. id. mode 0 Alternative alignments 0 Max reject 2147483647 Max accept 2147483647 Score bias 0 Realign hits false Realign score bias -0.2 Realign max seqs 2147483647 Correlation score weight 0 Gap open cost aa:11,nucl:5 Gap extension cost aa:1,nucl:2 Zdrop 40 Rescore mode 0 Remove hits by seq. id. and coverage false Sort results 0 Cluster mode 0 Max connected component depth 1000 Similarity type 2 Single step clustering false Cascaded clustering steps 3 Cluster reassign false Remove temporary files false Force restart with latest tmp false MPI runner k-mers per sequence 21 Scale k-mers per sequence aa:0.000,nucl:0.200 Adjust k-mer length false Shift hash 67 Include only extendable false Skip repeating k-mers false

Set cluster sensitivity to -s 5.000000 Set cluster mode SET COVER Set cluster iterations to 3 linclust tmpDB tmp/12397138995521121878/clu_redundancy tmp/12397138995521121878/linclust --cluster-mode 0 --max-iterations 1000 --similarity-type 2 --threads 128 --compressed 0 -v 3 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0.3 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --alph-size aa:13,nucl:5 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 -k 0 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --rescore-mode 0 --filter-hits 0 --sort-results 0 --remove-tmp-files 0 --force-reuse 0

kmermatcher tmpDB tmp/12397138995521121878/linclust/639052995728955397/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0.3 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --cov-mode 0 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 128 --compressed 0 -v 3

kmermatcher tmpDB tmp/12397138995521121878/linclust/639052995728955397/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0.3 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --cov-mode 0 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 128 --compressed 0 -v 3

Database size: 15000000 type: Aminoacid Reduced amino acid alphabet: (A S T) (C) (D B N) (E Q Z) (F Y) (G) (H) (I V) (K R) (L J M) (P) (W) (X)

Generate k-mers list for 1 split [=================================================================] 100.00% 15.00M 15s 758ms
Sort kmer 0h 0m 0s 932ms Sort by rep. sequence 0h 0m 0s 784ms Time for fill: 0h 0m 4s 19ms Time for merging to pref: 0h 0m 0s 5ms Time for processing: 0h 0m 26s 640ms rescorediagonal tmpDB tmpDB tmp/12397138995521121878/linclust/639052995728955397/pref tmp/12397138995521121878/linclust/639052995728955397/pref_rescore1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 0 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 0 --min-seq-id 0.5 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 128 --compressed 0 -v 3

Wangchentong avatar Sep 06 '22 12:09 Wangchentong

it would be nice if there's any advice @milot-mirdita

Wangchentong avatar Sep 06 '22 12:09 Wangchentong

I met the same issue as you. How do you end up solving this problem?

LittletreeZou avatar Dec 18 '23 16:12 LittletreeZou

No clue, i only encounter this issue on a node of a slurm cluster. I would recommend you to run it on another device rather than fix this problem. @LittletreeZou

Wangchentong avatar Dec 19 '23 07:12 Wangchentong

By "another device", do you mean a non slurm cluster?

LittletreeZou avatar Dec 19 '23 11:12 LittletreeZou

yes,i use a none slurm 64core machine to run all mmseqs and foldseek procudure later ,by the way,foldseek get same issue for me.

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: Shuxian Zou @.> 发送时间: Tuesday, December 19, 2023 7:35:35 PM 收件人: soedinglab/MMseqs2 @.> 抄送: Wangchentong @.>; Author @.> 主题: Re: [soedinglab/MMseqs2] process stuck at rescorediagonal when do cluster (Issue #602)

By "another device", do you mean a non slurm cluster?

― Reply to this email directly, view it on GitHubhttps://github.com/soedinglab/MMseqs2/issues/602#issuecomment-1862597691, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AV22KHFBXRJPMVAYRGW3FP3YKF3YPAVCNFSM6AAAAAAQFZF5I2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRSGU4TONRZGE. You are receiving this because you authored the thread.Message ID: @.***>

Wangchentong avatar Dec 19 '23 14:12 Wangchentong

Can you try to use less threads (--threads 32 or 64) on the same machine?

milot-mirdita avatar Dec 26 '23 04:12 milot-mirdita

I already tried use less threads or more database split num, cant figure out its a memory or thread competition issue, at least these two strategy not work for me. I can provide my machine and user account for you to debug if you wish.


From: Milot Mirdita @.> Sent: Tuesday, December 26, 2023 12:40 PM To: soedinglab/MMseqs2 @.> Cc: Wangchentong @.>; Author @.> Subject: Re: [soedinglab/MMseqs2] process stuck at rescorediagonal when do cluster (Issue #602)

Can you try to use less threads (--threads 32 or 64) on the same machine?

― Reply to this email directly, view it on GitHubhttps://github.com/soedinglab/MMseqs2/issues/602#issuecomment-1869248893, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AV22KHCU6QM2YZCFFOMHCJ3YLJILPAVCNFSM6AAAAAAQFZF5I2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZGI2DQOBZGM. You are receiving this because you authored the thread.Message ID: @.***>

Wangchentong avatar Dec 26 '23 14:12 Wangchentong