MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

Bus error while trying to cluster a nearly 800GB FASTA file using mmseqs easy-linclust

Open dezhi0730 opened this issue 1 year ago • 2 comments

Hello, author:

I encountered a Bus error while trying to cluster a nearly 800GB FASTA file using mmseqs easy-linclust. Below are my command, error message, and system configuration details. I would appreciate your guidance on resolving this issue. Command:

#!/bin/bash
#SBATCH --job-name=clust  # Job name
#SBATCH --output=logs/easy_clust_%j.log       # Output log file (%j will be replaced with the job ID)
#SBATCH --error=logs/easy_clust_%j.log         # Error log file (%j will be replaced with the job ID)
#SBATCH --ntasks=1                   # Number of tasks
#SBATCH --nodes=1                    # Number of nodes
#SBATCH --cpus-per-task=40
#SBATCH --gres=gpu:1                 # Number of GPUs
#SBATCH --partition=stdg_defq        # Partition name
#SBATCH --time=168:00:00               # Time limit (hh:mm:ss)
 
# Load necessary modules
module load mamba-24.3     # Example: load any necessary modules
source activate /exchange/xx
 
# Print job information
echo "Job ID: $SLURM_JOB_ID"
echo "Node List: $SLURM_JOB_NODELIST"
echo "Submit Directory: $SLURM_SUBMIT_DIR"
 
# Run your application
mmseqs easy-linclust /dfs/is/home/x266288/data_process/assets/FASTA/merged_all.fasta /dfs/is/home/x266288/data_process/assets/db/clustered/indi+oas/clustedRes /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir --min-seq-id 0.95 --cov-mode 1 -c 1.0      # Replace with your actual application command
 

Error Message:

Job ID: 192313
Node List: stdg22
Submit Directory: /home-cdo/x266288/data_process/utils
easy-linclust /dfs/is/home/x266288/data_process/assets/FASTA/merged_all.fasta /dfs/is/home/x266288/data_process/assets/db/clustered/indi+oas/clustedRes /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir --min-seq-id 0.95 --cov-mode 1 -c 1.0
 
MMseqs Version:                         13.45111
Cluster mode                            0
Max connected component depth           1000
Similarity type                         2
Threads                                 40
Compressed                              0
Verbosity                               3
Substitution matrix                     nucl:nucleotide.out,aa:blosum62.out
Add backtrace                           false
Alignment mode                          0
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0.95
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Coverage threshold                      1
Coverage mode                           1
Max sequence length                     65535
Compositional bias                      1
Max reject                              2147483647
Max accept                              2147483647
Include identical seq. id.              false
Preload mode                            0
Pseudo count a                          1
Pseudo count b                          1.5
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Gap open cost                           nucl:5,aa:11
Gap extension cost                      nucl:2,aa:1
Zdrop                                   40
Alphabet size                           nucl:5,aa:21
k-mers per sequence                     21
Spaced k-mers                           0
Spaced k-mer pattern                    
Scale k-mers per sequence               nucl:0.200,aa:0.000
Adjust k-mer length                     false
Mask residues                           1
Mask lower case residues                0
k-mer length                            0
Shift hash                              67
Split memory limit                      0
Include only extendable                 false
Skip repeating k-mers                   false
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Remove temporary files                  true
Force restart with latest tmp           false
MPI runner                              
Database type                           0
Shuffle input database                  true
Createdb mode                           1
Write lookup file                       0
Offset of numeric ids                   0
 
linclust /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/input /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu_tmp -e 0.001 --min-seq-id 0.95 -c 1 --cov-mode 1 --spaced-kmer-mode 0 --remove-tmp-files 1
 
Set cluster mode GREEDY MEM.
kmermatcher /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/input /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu_tmp/12397887837406899853/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size nucl:5,aa:13 --min-seq-id 0.95 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale nucl:0.200,aa:0.000 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 1 -k 0 -c 1 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 40 --compressed 0 -v 3
 
kmermatcher /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/input /dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu_tmp/12397887837406899853/pref --sub-mat nucl:nucleotide.out,aa:blosum62.out --alph-size nucl:5,aa:13 --min-seq-id 0.95 --kmer-per-seq 21 --spaced-kmer-mode 0 --kmer-per-seq-scale nucl:0.200,aa:0.000 --adjust-kmer-len 0 --mask 0 --mask-lower-case 0 --cov-mode 1 -k 0 -c 1 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 40 --compressed 0 -v 3
 
Database size: 2080936687 type: Nucleotide
 
Not enough memory to process at once need to split
[=================================================================] 2.08B 33m 39s 920ms
Process file into 11 parts
Generate k-mers list for 1 split
[=================================================================] 2.08B 37m 43s 776ms
 
Adjusted k-mer length 19
Sort kmer 0h 4m 42s 840ms
Sort by rep. sequence 0h 1m 40s 458ms
Generate k-mers list for 2 split
[=================================================================] 2.08B 37m 40s 661ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 55s 392ms
Sort by rep. sequence 0h 1m 43s 902ms
Generate k-mers list for 3 split
[=================================================================] 2.08B 36m 51s 84ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 55s 543ms
Sort by rep. sequence 0h 1m 41s 750ms
Generate k-mers list for 4 split
[=================================================================] 2.08B 37m 24s 796ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 52s 357ms
Sort by rep. sequence 0h 1m 40s 557ms
Generate k-mers list for 5 split  
[=================================================================] 2.08B 37m 57s 412ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 57s 804ms
Sort by rep. sequence 0h 1m 39s 453ms
Generate k-mers list for 6 split
[=================================================================] 2.08B 37m 10s 891ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 55s 794ms
Sort by rep. sequence 0h 1m 38s 542ms
Generate k-mers list for 7 split
[=================================================================] 2.08B 36m 53s 9ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 55s 788ms
Sort by rep. sequence 0h 1m 40s 551ms
Generate k-mers list for 8 split
[=================================================================] 2.08B 36m 54s 754ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 49s 532ms
Sort by rep. sequence 0h 1m 40s 244ms
Generate k-mers list for 9 split
[=================================================================] 2.08B 36m 24s 93ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 58s 556ms
Sort by rep. sequence 0h 1m 37s 893ms
Generate k-mers list for 10 split
[=================================================================] 2.08B 36m 46s 198ms
 
Adjusted k-mer length 19
Sort kmer 0h 2m 57s 392ms
Sort by rep. sequence 0h 1m 36s 238ms
Generate k-mers list for 11 split
[=================================================================
/dfs/is/home/x266288/data_process/tmp_dir/tmp_dir/1053738512421706396/clu_tmp/12397887837406899853/linclust.sh: line 26: 23857 Bus error               (core dumped) $RUNNER "$MMSEQS" kmermatcher "$INPUT" "${TMP_PATH}/pref" ${KMERMATCHER_PAR}
Error: kmermatcher died
Error: Search died
 

System Configuration:

MMseqs2 Version:13.45111
MEM:378G

From the error message, it seems related to memory allocation or hardware limitations, but I am unsure how to debug or fix this issue. If you could provide any suggestions or debugging tips, it would be greatly appreciated!

dezhi0730 avatar Dec 16 '24 03:12 dezhi0730

When the final split was being processed, the program got stuck for a long time. However, from the htop view, it shows that there is still a large portion of memory available, and the CPU core utilization is not very high.

dezhi0730 avatar Dec 17 '24 03:12 dezhi0730

When I try the same command again,I get this err message:

Adjusted k-mer length 19
Sort kmer 0h 2m 52s 516ms
Sort by rep. sequence 0h 1m 36s 729ms
Generate k-mers list for 11 split
[=================================================================
/dfs/is/home/x266288/data_process/tmp_dir/temp_dir/11178384644005550917/clu_tmp/9926663674530773728/linclust.sh: line 26: 12453 Killed                  $RUNNER "$MMSEQS" kmermatcher "$INPUT" "${TMP_PATH}/pref" ${KMERMATCHER_PAR}
Error: kmermatcher died
Error: Search died

dezhi0730 avatar Dec 17 '24 06:12 dezhi0730