ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

localcolabfold raise error "malloc(): invalid size (unsorted)" during expandaln

Open yanj14jy15 opened this issue 9 months ago • 27 comments

I was testing localcolabfold 1.5.5 with a list of protein complexes. colabfold_search with GPU seems to produce all a3m files without any issues but when I tried to use CPU, I got errors "corrupted size vs. prev_size".

Then I tried individual complexes and some worked while others didn't. I got error message "malloc(): invalid size (unsorted)" during expandaln. I've tried with or without indices, indices built with different hardware configurations and mmseqs15 but none of them resolved the issue. An example is attached.

colabfold_search-62781089.txt complex2.txt localcolabfold_msa.txt

yanj14jy15 avatar Mar 09 '25 18:03 yanj14jy15

I think I am having a similar issue. I am using colabfold_search with a single protein sequence. I am running the following command on a CPU: MMSEQS_NO_INDEX=1 colabfold_search test1_0.fasta results --threads 16 --prefilter-mode 1

The code starts to run successfully, but similarly fails at the expandaln call. The specific command that causes the failure is: expandaln results/qdb ../MSA_Databases/uniref30_2302_db.idx results/res ../MSA_Databases/uniref30_2302_db.idx results/res_exp --db-load-mode 1 --threads 1 --expansion-mode 0 -e inf --expand-filter-clusters 1 --max-seq-id 0.95

I have tried a variety of possible options and tracked the memory output, ensuring that I still have memory available. Depending on the exact environment, I get 2 possible errors:

  1. "malloc(): invalid size (unsorted)" (as above)
  2. "free(): corrupted unsorted chunks"

alatham13 avatar Mar 10 '25 18:03 alatham13

@alatham13 Thanks for sharing the information. I think I also encountered the second error message but probably 1-2 times. Could you please share/attach your log files for troubleshooting?

yanj14jy15 avatar Mar 10 '25 20:03 yanj14jy15

@alatham13 Thanks for sharing the information. I think I also encountered the second error message but probably 1-2 times. Could you please share/attach your log files for troubleshooting?

@yanj14jy15 Thank you! Here is an example log file

log.txt

alatham13 avatar Mar 10 '25 21:03 alatham13

I think this issue is because the database was built with older version of MMSeqs but the latest version of colabfold_search uses the latest version of MMSeqs2. Using the binaries from an older version of MMSeqs fixed this issue for me. However, while folding using colabfold_batch using templates .m8 file and local database that I have built gives me this error now

Traceback (most recent call last):
  File "/usr/local/envs/colabfold/lib/python3.11/site-packages/colabfold/batch.py", line 1461, in run
    = get_msa_and_templates(jobname, query_seqs_unique, unpaired_msa, result_dir, 'single_sequence', use_templates,
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/envs/colabfold/lib/python3.11/site-packages/colabfold/batch.py", line 781, in get_msa_and_templates
    template_feature = mk_template(
                       ^^^^^^^^^^^^
  File "/usr/local/envs/colabfold/lib/python3.11/site-packages/colabfold/batch.py", line 133, in mk_template
    hhsearch_hits = pipeline.parsers.parse_hhr(hhsearch_result)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/envs/colabfold/lib/python3.11/site-packages/alphafold/data/parsers.py", line 505, in parse_hhr
    hits.append(_parse_hhr_hit(lines[block_starts[i]:block_starts[i + 1]]))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/envs/colabfold/lib/python3.11/site-packages/alphafold/data/parsers.py", line 446, in _parse_hhr_hit
    groups = _get_hhr_line_regex_groups(patt, line[17:])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/envs/colabfold/lib/python3.11/site-packages/alphafold/data/parsers.py", line 379, in _get_hhr_line_regex_groups
    raise RuntimeError(f'Could not parse query line {line}')
RuntimeError: Could not parse query line  

punit-jha123 avatar Mar 10 '25 23:03 punit-jha123

I used mmseqs17 in colabfold-conda bin to build the index and perform colabfold_search. So I tend to think it's not the issue in my case. I also tried mmseqs15 but the issue was not resolved.

yanj14jy15 avatar Mar 11 '25 00:03 yanj14jy15

Hello, I met similar problem during mmseqs expandaln, the version of colabfold and mmseqs are both lastest. I have tried many version of mmseqs, but every time the results were the same as following mentioned.

Invalid database read for database data file=XML/db/hits_db.idx, database index=XML/db/hits_db.idx.index getData: local id (4294967295) >= db size (22)

log.txt

dklurker avatar Mar 11 '25 04:03 dklurker

Thank you @punit-jha123. Moving to MMseqs release 15 solved the issue for me.

alatham13 avatar Mar 14 '25 19:03 alatham13

@alatham13, thank you. Yes, we do have a bug in the current version. The current version works for expanded databases on GPU, but not on CPU. For the CPU version, version 15 is needed. You also need to build the DBs with the respective version. We are planning to fix this soon.

martin-steinegger avatar May 11 '25 07:05 martin-steinegger

@dklurker did you solve this bug now? i have the same problem

camel2000 avatar May 14 '25 11:05 camel2000

@dklurker did you solve this bug now? i have the same problem @camel2000 Please refer to this page. I solved this problem according to "Starting from scratch" in https://github.com/sokrypton/ColabFold/wiki/Creating-expandable-search-databases

dklurker avatar May 16 '25 07:05 dklurker

@dklurker thanks for remind the "Starting from scratch"

conclusion:

  1. gpu search doubled the speed compared with cpu version
  2. I still failed in the expandaln step when i run the GPU version pipeline

below are the script i‘m using:

build_database.bash

export CUDA_VISIBLE_DEVICES=1
export MMSEQS_CALL_DEPTH=1

DATA_DIR=/mnt/localssd/yakunli/data/level-1/demo
FASTA=/home/yakun_li_genbio_ai/project/msa/examples/DB.fasta
DBNAME=targetDB
####################################################################################################

mmseqs createdb ${FASTA} ${DATA_DIR}/seqdb
# mmseqs makepaddedseqdb ${DATA_DIR}/seqdb_ ${DATA_DIR}/seqdb

# parameter choice is very important here, generally you want to cluster to a low sequence identity however keep a high coverage.
# Without a high coverage, we might lose a domain in the representative sequence and then not be able to find the domain in any of the members anymore, since we always first need to match the cluster representative
mmseqs cluster ${DATA_DIR}/seqdb ${DATA_DIR}/clu ${DATA_DIR}/tmp --min-seq-id 0.3 -c 0.8
# disable E-value threshold with -e inf, accept everything that was clustered
mmseqs align ${DATA_DIR}/seqdb ${DATA_DIR}/seqdb ${DATA_DIR}/clu ${DATA_DIR}/aln -a -e inf
mmseqs result2profile ${DATA_DIR}/seqdb ${DATA_DIR}/seqdb ${DATA_DIR}/aln ${DATA_DIR}/prof
mmseqs profile2consensus ${DATA_DIR}/prof ${DATA_DIR}/cons
mmseqs prefixid ${DATA_DIR}/cons ${DATA_DIR}/${DBNAME}.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/seqdb ${DATA_DIR}/${DBNAME}_seq.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/seqdb_h ${DATA_DIR}/${DBNAME}_h.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/aln ${DATA_DIR}/${DBNAME}_aln.tsv --tsv --threads 1
mmseqs tsv2exprofiledb ${DBNAME} ${DATA_DIR}/${DBNAME}


mmseqs createindex ${DATA_DIR}/${DBNAME} ${DATA_DIR}/tmp  

#build GPU padded database
mmseqs makepaddedseqdb ${DATA_DIR}/${DBNAME} ${DATA_DIR}/${DBNAME}_gpu

#build GPU database index
mmseqs createindex ${DATA_DIR}/${DBNAME}_gpu ${DATA_DIR}/tmp --index-subset 2 --split 1 


nohup mmseqs gpuserver ${DATA_DIR}/${DBNAME}_gpu > ${DATA_DIR}/gpuserver.log 2>&1 &

cpu version search pipeline:

start_time=$(date +%s) 
mkdir -p ${result_dir}
mmseqs createdb ${input_fasta_path} ${result_dir}/qdb

mmseqs search ${result_dir}/qdb ${database_path} \
  ${result_dir}/res ${result_dir}/tmp \
  --num-iterations 3 --db-load-mode 0 -s 8 -e 0.1 --max-seqs 10000 -a #--threads 128


end_time=$(date +%s)    # record end time
elapsed_time=$((end_time - start_time))  # compute time cost
echo "search cost: $elapsed_time s"


mmseqs expandaln ${result_dir}/qdb ${database_path}.idx \
    ${result_dir}/res ${database_path}.idx ${result_dir}/res_exp \
  --db-load-mode 2 --expansion-mode 0 -e inf \
  --expand-filter-clusters 1 --max-seq-id 0.95

mmseqs mvdb ${result_dir}/tmp/latest/profile_1 ${result_dir}/prof_res

mmseqs lndb ${result_dir}/qdb_h ${result_dir}/prof_res_h

mmseqs align ${result_dir}/prof_res ${database_path}.idx \
  ${result_dir}/res_exp ${result_dir}/res_exp_realign \
  --db-load-mode 2 -e 10 --max-accept 100000 --alt-ali 10 -a

mmseqs filterresult ${result_dir}/qdb ${database_path}.idx \
  ${result_dir}/res_exp_realign ${result_dir}/res_exp_realign_filter \
  --db-load-mode 2 --qid 0 --qsc 0.8 --diff 0 --max-seq-id 1.0 \
  --filter-min-enable 100


mmseqs result2msa ${result_dir}/qdb ${database_path}.idx \
  ${result_dir}/res_exp_realign_filter ${result_dir}/${out_a3m}  \
  --msa-format-mode 6 --db-load-mode 2 --filter-msa 1 \
  --filter-min-enable 1000 --diff 3000 \
  --qid 0.0,0.2,0.4,0.6,0.8,1.0 --qsc 0 --max-seq-id 0.95


head ${result_dir}/${out_a3m}

mmseqs rmdb ${result_dir}/res_exp_realign_filter
mmseqs rmdb ${result_dir}/res_exp_realign
mmseqs rmdb "${result_dir}/res_exp"
mmseqs rmdb ${result_dir}/res

mmseqs rmdb ${result_dir}/qdb
mmseqs rmdb ${result_dir}/qdb_h
mmseqs rmdb ${result_dir}/res


end_time=$(date +%s)    # record end time
elapsed_time=$((end_time - start_time))  # compute time cost

echo "cost: $elapsed_time s"

gpu version pipeline:

start_time=$(date +%s) 
mkdir -p ${result_dir}
mmseqs createdb ${input_fasta_path} ${result_dir}/qdb

mmseqs search ${result_dir}/qdb ${database_path}/${online_serve}_gpu ${result_dir}/res_gpu ${result_dir}/tmp \
 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 --gpu-server 1 # --threads 64 


end_time=$(date +%s)    # record end time
elapsed_time=$((end_time - start_time))  # compute time cost
echo "search cost: $elapsed_time s"

mmseqs expandaln ${result_dir}/qdb \
  ${database_path}/${online_serve}_gpu.idx \
  ${result_dir}/res_gpu \
  ${database_path}/${online_serve}_gpu.idx \
  ${result_dir}/res_exp \
  --db-load-mode 0 --threads 64 --expansion-mode 0 -e inf \
  --expand-filter-clusters 0 --max-seq-id 0.95 

camel2000 avatar May 16 '25 12:05 camel2000

What is the error?

martin-steinegger avatar May 18 '25 05:05 martin-steinegger

now, i can use gpu to build database and than retrieve MSA from input_DB.fasta, but for colabfold_envdb_202108 ,we should build the database from .fasta or just start from tsv2exprofiledb step ? i tried to retrieve msa in colabfold_envdb_202108 , the process is as below: step 1: download the data first(http://wwwuser.gwdg.de/~compbiol/colabfold/colabfold_envdb_202108.tar.gz), step2 : run tsv2exprofiledb and createindex step3 : search msa(search + expandaln + lndb + align + .... ), but failed in expandaln step error is like: "malloc(): invalid size (unsorted)"

mmseqs tsv2exprofiledb  ${DATA_BASE_DIR}/${DBNAME} ${DATA_BASE_DIR}/${DBNAME} --gpu 1
mmseqs createindex ${DATA_BASE_DIR}/${DBNAME} ${DATA_BASE_DIR}/tmp  --split 1 --index-subset 2

build database script is as below(this script works good, if one build a database from .fasta file):

mmseqs createdb ${FASTA} ${DATA_DIR}/seqdb

# parameter choice is very important here, generally you want to cluster to a low sequence identity however keep a high coverage.
# Without a high coverage, we might lose a domain in the representative sequence and then not be able to find the domain in any of the members anymore, since we always first need to match the cluster representative
mmseqs cluster ${DATA_DIR}/seqdb ${DATA_DIR}/clu ${DATA_DIR}/tmp --min-seq-id 0.3 -c 0.8
# disable E-value threshold with -e inf, accept everything that was clustered
mmseqs align ${DATA_DIR}/seqdb ${DATA_DIR}/seqdb ${DATA_DIR}/clu ${DATA_DIR}/aln -a -e inf
mmseqs result2profile ${DATA_DIR}/seqdb ${DATA_DIR}/seqdb ${DATA_DIR}/aln ${DATA_DIR}/prof
mmseqs profile2consensus ${DATA_DIR}/prof ${DATA_DIR}/cons
mmseqs prefixid ${DATA_DIR}/cons ${DATA_DIR}/${DBNAME}.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/seqdb ${DATA_DIR}/${DBNAME}_seq.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/seqdb_h ${DATA_DIR}/${DBNAME}_h.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/aln ${DATA_DIR}/${DBNAME}_aln.tsv --tsv --threads 1
mmseqs tsv2exprofiledb ${DBNAME} ${DATA_DIR}/${DBNAME} --gpu 1


mmseqs createindex ${DATA_DIR}/${DBNAME} ${DATA_DIR}/tmp  

# # build GPU padded database
mmseqs makepaddedseqdb ${DATA_DIR}/${DBNAME} ${DATA_DIR}/${DBNAME}_gpu

# # build GPU database index
mmseqs createindex ${DATA_DIR}/${DBNAME}_gpu ${DATA_DIR}/tmp --index-subset 2 --split 1 

camel2000 avatar May 18 '25 22:05 camel2000

@yanj14jy15 I think mmseq2 version 15 did not support GPU, how can you test it with GPU ?

camel2000 avatar May 18 '25 22:05 camel2000

The following MMseqs2 build should fix the issue for both CPU and GPU: https://mmseqs.com/archive/8783404eab75833dcb865153ed2e146431649efa

You can download the precompiled binary above (likely the GPU-enabled Linux binary: https://mmseqs.com/archive/8783404eab75833dcb865153ed2e146431649efa/mmseqs-linux-gpu.tar.gz ) and pass the mmseqs binary contained within to colabfold_search --mmseqs path-to-binary.

Please let me know if this works.

milot-mirdita avatar May 19 '25 09:05 milot-mirdita

Thank for your work, hope this version works. Besides, I wonder why do not run makepaddedseqdb on uniref30 and colabfold_envdb_202108 database in setup_databases.sh if I set GPU.

jianzhang-lu avatar May 19 '25 09:05 jianzhang-lu

tsv2exprofiledb in setup_databases.sh calls makepaddedseqdb

milot-mirdita avatar May 19 '25 10:05 milot-mirdita

@milot-mirdita i tried https://mmseqs.com/archive/8783404eab75833dcb865153ed2e146431649efa , but still failed at 'expandaln'.

error:

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct null not valid
/home/yakun_li_genbio_ai/project/msa/bin/../bin/msa_retrieve_pipeline.bash: line 72: 4133157 Aborted

the process is as follows:

step1: build database

1.1 download env_db

1.2 tsv2exprofiledb and createindex

mmseqs tsv2exprofiledb  ${DATA_BASE_DIR}/${DBNAME} \
    ${DATA_BASE_DIR}/${DBNAME}_db --gpu 1

mmseqs createindex ${DATA_BASE_DIR}/${DBNAME}_db ${DATA_BASE_DIR}/tmp  \
    --remove-tmp-files 1 --split 1 --index-subset 2

step2: search

mmseqs createdb ${input_fasta_path} ${result_dir}/qdb

mmseqs search ${result_dir}/qdb \
        ${database_path}/${DB_NAME} \
        ${result_dir}/res_gpu \
        ${result_dir}/tmp \
        --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 \
        --gpu 1 --prefilter-mode 1 #--gpu-server 1  #--threads 64 

mmseqs expandaln ${result_dir}/qdb \
        ${database_path}/${DB_NAME}.idx \
        ${result_dir}/res_gpu \
        ${database_path}/${DB_NAME}.idx \
        ${result_dir}/res_exp \
        --db-load-mode 0 --threads 64 --expansion-mode 0 -e inf \
        --expand-filter-clusters 0 --max-seq-id 0.95

camel2000 avatar May 21 '25 07:05 camel2000

@camel2000 Please upload the terminal output of the executed commands too

milot-mirdita avatar May 21 '25 08:05 milot-mirdita

PROJECT_DIR:~/project/msa/bin/..
use GPU: 1
createdb ~/project/msa/examples/one_query.fasta ~/project/msa/examples/demo/qdb 

Converting sequences

Time for merging to qdb_h: 0h 0m 0s 496ms
Time for merging to qdb: 0h 0m 0s 474ms
Database type: Aminoacid
Time for processing: 0h 0m 1s 808ms
Create directory ~/project/msa/examples/demo/tmp
search ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db ~/project/msa/examples/demo/res_gpu ~/project/msa/examples/demo/tmp --num-iterations 3 --db-load-mode 2 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 

ungappedprefilter ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 2 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 208 --compressed 0 -v 3 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
[=================================================================] 100.00% 1 eta -
Time for merging to pref_0: 0h 0m 0s 20ms
Time for processing: 0h 0m 10s 391ms
align ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_0 ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 1 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 208 --compressed 0 -v 3 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
Compute score only
Query database size: 1 type: Aminoacid
Target database size: 209335862 type: Aminoacid
Calculation of alignments
[=================================================================] 100.00% 1 eta -
Time for merging to aln_0: 0h 0m 0s 20ms
10000 alignments calculated
220 sequence pairs passed the thresholds (0.022000 of overall calculated)
220.000000 hits per query sequence
Time for processing: 0h 0m 0s 423ms
result2profile ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -e 0.1 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --comp-bias-corr-scale 1 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --db-load-mode 2 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --threads 208 --compressed 0 -v 3 --profile-output-mode 0 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
Query database size: 1 type: Aminoacid
Target database size: 209335862 type: Aminoacid
[=================================================================] 100.00% 1 eta -
Time for merging to profile_0: 0h 0m 0s 22ms
Time for processing: 0h 0m 0s 134ms
ungappedprefilter ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 2 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 208 --compressed 0 -v 3 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
[=================================================================] 100.00% 1 eta -
Time for merging to pref_tmp_1: 0h 0m 0s 24ms
Time for processing: 0h 0m 10s 331ms
subtractdbs ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ~/project/msa/examples/demo/tmp/502615879695006624/pref_1 --threads 208 --e-profile 0.1 -e 0.1 --compressed 0 -v 3 

subtractdbs ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ~/project/msa/examples/demo/tmp/502615879695006624/pref_1 --threads 208 --e-profile 0.1 -e 0.1 --compressed 0 -v 3 

Remove ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ids from ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1
[=================================================================] 100.00% 1 eta -
Time for merging to pref_1: 0h 0m 1s 907ms
Time for processing: 0h 0m 4s 55ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1 

Time for processing: 0h 0m 0s 12ms
align ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 208 --compressed 0 -v 3 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
Compute score, coverage and sequence identity
Query database size: 1 type: Profile
Target database size: 209335862 type: Aminoacid
Calculation of alignments
[=================================================================] 100.00% 1 eta -
Time for merging to aln_tmp_1: 0h 0m 0s 21ms
9780 alignments calculated
34 sequence pairs passed the thresholds (0.003476 of overall calculated)
34.000000 hits per query sequence
Time for processing: 0h 0m 0s 606ms
mergedbs ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_1 

Merging the results to ~/project/msa/examples/demo/tmp/502615879695006624/aln_1
[=================================================================] 100.00% 1 eta -
Time for merging to aln_1: 0h 0m 0s 29ms
Time for processing: 0h 0m 0s 79ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 

Time for processing: 0h 0m 0s 9ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_1 

Time for processing: 0h 0m 0s 9ms
result2profile ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/profile_1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -e 0.1 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --comp-bias-corr-scale 1 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --db-load-mode 2 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --threads 208 --compressed 0 -v 3 --profile-output-mode 0 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
Query database size: 1 type: Profile
Target database size: 209335862 type: Aminoacid
[=================================================================] 100.00% 1 eta -
Time for merging to profile_1: 0h 0m 0s 21ms
Time for processing: 0h 0m 0s 138ms
ungappedprefilter ~/project/msa/examples/demo/tmp/502615879695006624/profile_1 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 2 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 208 --compressed 0 -v 3 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
[=================================================================] 100.00% 1 eta -
Time for merging to pref_tmp_2: 0h 0m 0s 23ms
Time for processing: 0h 0m 10s 416ms
subtractdbs ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2 ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/pref_2 --threads 208 --e-profile 0.1 -e 0.1 --compressed 0 -v 3 

subtractdbs ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2 ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/pref_2 --threads 208 --e-profile 0.1 -e 0.1 --compressed 0 -v 3 

Remove ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ids from ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2
[=================================================================] 100.00% 1 eta -
Time for merging to pref_2: 0h 0m 1s 890ms
Time for processing: 0h 0m 3s 994ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2 

Time for processing: 0h 0m 0s 11ms
align ~/project/msa/examples/demo/tmp/502615879695006624/profile_1 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_2 ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 208 --compressed 0 -v 3 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
Compute score, coverage and sequence identity
Query database size: 1 type: Profile
Target database size: 209335862 type: Aminoacid
Calculation of alignments
[=================================================================] 100.00% 1 eta -
Time for merging to aln_tmp_2: 0h 0m 0s 20ms
9746 alignments calculated
94 sequence pairs passed the thresholds (0.009645 of overall calculated)
94.000000 hits per query sequence
Time for processing: 0h 0m 0s 592ms
mergedbs ~/project/msa/examples/demo/tmp/502615879695006624/profile_1 ~/project/msa/examples/demo/res_gpu ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_2 

Merging the results to ~/project/msa/examples/demo/res_gpu
[=================================================================] 100.00% 1 eta -
Time for merging to res_gpu: 0h 0m 0s 23ms
Time for processing: 0h 0m 0s 75ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 

Time for processing: 0h 0m 0s 10ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_2 

Time for processing: 0h 0m 0s 9ms
search cost: 44 s
expandaln ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/res_gpu /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/res_exp --db-load-mode 0 --threads 64 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 

Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
Index version: 16
Generated by:  8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix:  VTML80.out
[=================================================================] 100.00% 1 eta -
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string: construction from null is not valid
~/project/msa/bin/../bin/msa_retrieve_pipeline.bash: line 72: 4138815 Aborted                 mmseqs expandaln ${result_dir}/qdb ${database_path}/${DB_NAME}.idx ${result_dir}/res_gpu ${database_path}/${DB_NAME}.idx ${result_dir}/res_exp --db-load-mode 0 --threads 64 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95

@milot-mirdita this is the terminal output of the executed commands, thanks very much

camel2000 avatar May 21 '25 09:05 camel2000

@camel2000 would it be possible to upload or send me the query sequence?

milot-mirdita avatar May 21 '25 10:05 milot-mirdita

@milot-mirdita thanks, this is the query fasta content

>tr|A7TBS3|A7TBS3_NEMVE Predicted protein (Fragment) OS=Nematostella vectensis GN=v1g153959 PE=4 SV=1 Split=0 
VCIHTENQNQVSFYPFVLHEISVLIELTLGHLRYRLTDVPPQPNSQPDSATNYVWML

camel2000 avatar May 21 '25 11:05 camel2000

@camel2000 I cannot get that sequence to crash on my side.

I tried exactly the commands you listed above: https://github.com/sokrypton/ColabFold/issues/691#issuecomment-2896838973

Could you rerun the crashing expandaln command with unset MMSEQS_CALL_DEPTH before, so it prints more debug information?

unset MMSEQS_CALL_DEPTH
mmseqs expandaln ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/res_gpu /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/res_exp --db-load-mode 0 --threads 64 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95 

Could you also pack qdb* and res_gpu* into a tar and send this to me/upload it here?

milot-mirdita avatar May 22 '25 05:05 milot-mirdita

you known what,I succeeded on the uniref30_2302 dataset, but failed many many times on colabfold_envdb_202108 (GPU version) @milot-mirdita

camel2000 avatar May 22 '25 12:05 camel2000

I am trying to reproduce your issue on the colabfolddb, but I am failing at that. Please try the steps above, maybe these will help to diagnose what's wrong

milot-mirdita avatar May 22 '25 12:05 milot-mirdita

i am trying to (i restart over again). when i have something new ,i will let you known @milot-mirdita thanks very much

camel2000 avatar May 22 '25 12:05 camel2000

@milot-mirdita i think i have solved all the bugs,the failure on the colabfolddb is because “there have no enough apace on /tmp ”, There is only one line of information indicating insufficient space, which is mixed among numerous logs and is difficult to be detected

camel2000 avatar May 24 '25 09:05 camel2000