localcolabfold raise error "malloc(): invalid size (unsorted)" during expandaln
I was testing localcolabfold 1.5.5 with a list of protein complexes. colabfold_search with GPU seems to produce all a3m files without any issues but when I tried to use CPU, I got errors "corrupted size vs. prev_size".
Then I tried individual complexes and some worked while others didn't. I got error message "malloc(): invalid size (unsorted)" during expandaln. I've tried with or without indices, indices built with different hardware configurations and mmseqs15 but none of them resolved the issue. An example is attached.
colabfold_search-62781089.txt complex2.txt localcolabfold_msa.txt
I think I am having a similar issue. I am using colabfold_search with a single protein sequence. I am running the following command on a CPU: MMSEQS_NO_INDEX=1 colabfold_search test1_0.fasta results --threads 16 --prefilter-mode 1
The code starts to run successfully, but similarly fails at the expandaln call. The specific command that causes the failure is: expandaln results/qdb ../MSA_Databases/uniref30_2302_db.idx results/res ../MSA_Databases/uniref30_2302_db.idx results/res_exp --db-load-mode 1 --threads 1 --expansion-mode 0 -e inf --expand-filter-clusters 1 --max-seq-id 0.95
I have tried a variety of possible options and tracked the memory output, ensuring that I still have memory available. Depending on the exact environment, I get 2 possible errors:
- "malloc(): invalid size (unsorted)" (as above)
- "free(): corrupted unsorted chunks"
@alatham13 Thanks for sharing the information. I think I also encountered the second error message but probably 1-2 times. Could you please share/attach your log files for troubleshooting?
@alatham13 Thanks for sharing the information. I think I also encountered the second error message but probably 1-2 times. Could you please share/attach your log files for troubleshooting?
@yanj14jy15 Thank you! Here is an example log file
I think this issue is because the database was built with older version of MMSeqs but the latest version of colabfold_search uses the latest version of MMSeqs2. Using the binaries from an older version of MMSeqs fixed this issue for me. However, while folding using colabfold_batch using templates .m8 file and local database that I have built gives me this error now
Traceback (most recent call last):
File "/usr/local/envs/colabfold/lib/python3.11/site-packages/colabfold/batch.py", line 1461, in run
= get_msa_and_templates(jobname, query_seqs_unique, unpaired_msa, result_dir, 'single_sequence', use_templates,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/envs/colabfold/lib/python3.11/site-packages/colabfold/batch.py", line 781, in get_msa_and_templates
template_feature = mk_template(
^^^^^^^^^^^^
File "/usr/local/envs/colabfold/lib/python3.11/site-packages/colabfold/batch.py", line 133, in mk_template
hhsearch_hits = pipeline.parsers.parse_hhr(hhsearch_result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/envs/colabfold/lib/python3.11/site-packages/alphafold/data/parsers.py", line 505, in parse_hhr
hits.append(_parse_hhr_hit(lines[block_starts[i]:block_starts[i + 1]]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/envs/colabfold/lib/python3.11/site-packages/alphafold/data/parsers.py", line 446, in _parse_hhr_hit
groups = _get_hhr_line_regex_groups(patt, line[17:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/envs/colabfold/lib/python3.11/site-packages/alphafold/data/parsers.py", line 379, in _get_hhr_line_regex_groups
raise RuntimeError(f'Could not parse query line {line}')
RuntimeError: Could not parse query line
I used mmseqs17 in colabfold-conda bin to build the index and perform colabfold_search. So I tend to think it's not the issue in my case. I also tried mmseqs15 but the issue was not resolved.
Hello, I met similar problem during mmseqs expandaln, the version of colabfold and mmseqs are both lastest. I have tried many version of mmseqs, but every time the results were the same as following mentioned.
Invalid database read for database data file=XML/db/hits_db.idx, database index=XML/db/hits_db.idx.index getData: local id (4294967295) >= db size (22)
Thank you @punit-jha123. Moving to MMseqs release 15 solved the issue for me.
@alatham13, thank you. Yes, we do have a bug in the current version. The current version works for expanded databases on GPU, but not on CPU. For the CPU version, version 15 is needed. You also need to build the DBs with the respective version. We are planning to fix this soon.
@dklurker did you solve this bug now? i have the same problem
@dklurker did you solve this bug now? i have the same problem @camel2000 Please refer to this page. I solved this problem according to "Starting from scratch" in https://github.com/sokrypton/ColabFold/wiki/Creating-expandable-search-databases
@dklurker thanks for remind the "Starting from scratch"
conclusion:
- gpu search doubled the speed compared with cpu version
- I still failed in the expandaln step when i run the GPU version pipeline
below are the script i‘m using:
build_database.bash
export CUDA_VISIBLE_DEVICES=1
export MMSEQS_CALL_DEPTH=1
DATA_DIR=/mnt/localssd/yakunli/data/level-1/demo
FASTA=/home/yakun_li_genbio_ai/project/msa/examples/DB.fasta
DBNAME=targetDB
####################################################################################################
mmseqs createdb ${FASTA} ${DATA_DIR}/seqdb
# mmseqs makepaddedseqdb ${DATA_DIR}/seqdb_ ${DATA_DIR}/seqdb
# parameter choice is very important here, generally you want to cluster to a low sequence identity however keep a high coverage.
# Without a high coverage, we might lose a domain in the representative sequence and then not be able to find the domain in any of the members anymore, since we always first need to match the cluster representative
mmseqs cluster ${DATA_DIR}/seqdb ${DATA_DIR}/clu ${DATA_DIR}/tmp --min-seq-id 0.3 -c 0.8
# disable E-value threshold with -e inf, accept everything that was clustered
mmseqs align ${DATA_DIR}/seqdb ${DATA_DIR}/seqdb ${DATA_DIR}/clu ${DATA_DIR}/aln -a -e inf
mmseqs result2profile ${DATA_DIR}/seqdb ${DATA_DIR}/seqdb ${DATA_DIR}/aln ${DATA_DIR}/prof
mmseqs profile2consensus ${DATA_DIR}/prof ${DATA_DIR}/cons
mmseqs prefixid ${DATA_DIR}/cons ${DATA_DIR}/${DBNAME}.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/seqdb ${DATA_DIR}/${DBNAME}_seq.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/seqdb_h ${DATA_DIR}/${DBNAME}_h.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/aln ${DATA_DIR}/${DBNAME}_aln.tsv --tsv --threads 1
mmseqs tsv2exprofiledb ${DBNAME} ${DATA_DIR}/${DBNAME}
mmseqs createindex ${DATA_DIR}/${DBNAME} ${DATA_DIR}/tmp
#build GPU padded database
mmseqs makepaddedseqdb ${DATA_DIR}/${DBNAME} ${DATA_DIR}/${DBNAME}_gpu
#build GPU database index
mmseqs createindex ${DATA_DIR}/${DBNAME}_gpu ${DATA_DIR}/tmp --index-subset 2 --split 1
nohup mmseqs gpuserver ${DATA_DIR}/${DBNAME}_gpu > ${DATA_DIR}/gpuserver.log 2>&1 &
cpu version search pipeline:
start_time=$(date +%s)
mkdir -p ${result_dir}
mmseqs createdb ${input_fasta_path} ${result_dir}/qdb
mmseqs search ${result_dir}/qdb ${database_path} \
${result_dir}/res ${result_dir}/tmp \
--num-iterations 3 --db-load-mode 0 -s 8 -e 0.1 --max-seqs 10000 -a #--threads 128
end_time=$(date +%s) # record end time
elapsed_time=$((end_time - start_time)) # compute time cost
echo "search cost: $elapsed_time s"
mmseqs expandaln ${result_dir}/qdb ${database_path}.idx \
${result_dir}/res ${database_path}.idx ${result_dir}/res_exp \
--db-load-mode 2 --expansion-mode 0 -e inf \
--expand-filter-clusters 1 --max-seq-id 0.95
mmseqs mvdb ${result_dir}/tmp/latest/profile_1 ${result_dir}/prof_res
mmseqs lndb ${result_dir}/qdb_h ${result_dir}/prof_res_h
mmseqs align ${result_dir}/prof_res ${database_path}.idx \
${result_dir}/res_exp ${result_dir}/res_exp_realign \
--db-load-mode 2 -e 10 --max-accept 100000 --alt-ali 10 -a
mmseqs filterresult ${result_dir}/qdb ${database_path}.idx \
${result_dir}/res_exp_realign ${result_dir}/res_exp_realign_filter \
--db-load-mode 2 --qid 0 --qsc 0.8 --diff 0 --max-seq-id 1.0 \
--filter-min-enable 100
mmseqs result2msa ${result_dir}/qdb ${database_path}.idx \
${result_dir}/res_exp_realign_filter ${result_dir}/${out_a3m} \
--msa-format-mode 6 --db-load-mode 2 --filter-msa 1 \
--filter-min-enable 1000 --diff 3000 \
--qid 0.0,0.2,0.4,0.6,0.8,1.0 --qsc 0 --max-seq-id 0.95
head ${result_dir}/${out_a3m}
mmseqs rmdb ${result_dir}/res_exp_realign_filter
mmseqs rmdb ${result_dir}/res_exp_realign
mmseqs rmdb "${result_dir}/res_exp"
mmseqs rmdb ${result_dir}/res
mmseqs rmdb ${result_dir}/qdb
mmseqs rmdb ${result_dir}/qdb_h
mmseqs rmdb ${result_dir}/res
end_time=$(date +%s) # record end time
elapsed_time=$((end_time - start_time)) # compute time cost
echo "cost: $elapsed_time s"
gpu version pipeline:
start_time=$(date +%s)
mkdir -p ${result_dir}
mmseqs createdb ${input_fasta_path} ${result_dir}/qdb
mmseqs search ${result_dir}/qdb ${database_path}/${online_serve}_gpu ${result_dir}/res_gpu ${result_dir}/tmp \
--num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 --gpu-server 1 # --threads 64
end_time=$(date +%s) # record end time
elapsed_time=$((end_time - start_time)) # compute time cost
echo "search cost: $elapsed_time s"
mmseqs expandaln ${result_dir}/qdb \
${database_path}/${online_serve}_gpu.idx \
${result_dir}/res_gpu \
${database_path}/${online_serve}_gpu.idx \
${result_dir}/res_exp \
--db-load-mode 0 --threads 64 --expansion-mode 0 -e inf \
--expand-filter-clusters 0 --max-seq-id 0.95
What is the error?
now, i can use gpu to build database and than retrieve MSA from input_DB.fasta, but for colabfold_envdb_202108 ,we should build the database from .fasta or just start from tsv2exprofiledb step ? i tried to retrieve msa in colabfold_envdb_202108 , the process is as below: step 1: download the data first(http://wwwuser.gwdg.de/~compbiol/colabfold/colabfold_envdb_202108.tar.gz), step2 : run tsv2exprofiledb and createindex step3 : search msa(search + expandaln + lndb + align + .... ), but failed in expandaln step error is like: "malloc(): invalid size (unsorted)"
mmseqs tsv2exprofiledb ${DATA_BASE_DIR}/${DBNAME} ${DATA_BASE_DIR}/${DBNAME} --gpu 1
mmseqs createindex ${DATA_BASE_DIR}/${DBNAME} ${DATA_BASE_DIR}/tmp --split 1 --index-subset 2
build database script is as below(this script works good, if one build a database from .fasta file):
mmseqs createdb ${FASTA} ${DATA_DIR}/seqdb
# parameter choice is very important here, generally you want to cluster to a low sequence identity however keep a high coverage.
# Without a high coverage, we might lose a domain in the representative sequence and then not be able to find the domain in any of the members anymore, since we always first need to match the cluster representative
mmseqs cluster ${DATA_DIR}/seqdb ${DATA_DIR}/clu ${DATA_DIR}/tmp --min-seq-id 0.3 -c 0.8
# disable E-value threshold with -e inf, accept everything that was clustered
mmseqs align ${DATA_DIR}/seqdb ${DATA_DIR}/seqdb ${DATA_DIR}/clu ${DATA_DIR}/aln -a -e inf
mmseqs result2profile ${DATA_DIR}/seqdb ${DATA_DIR}/seqdb ${DATA_DIR}/aln ${DATA_DIR}/prof
mmseqs profile2consensus ${DATA_DIR}/prof ${DATA_DIR}/cons
mmseqs prefixid ${DATA_DIR}/cons ${DATA_DIR}/${DBNAME}.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/seqdb ${DATA_DIR}/${DBNAME}_seq.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/seqdb_h ${DATA_DIR}/${DBNAME}_h.tsv --tsv --threads 1
mmseqs prefixid ${DATA_DIR}/aln ${DATA_DIR}/${DBNAME}_aln.tsv --tsv --threads 1
mmseqs tsv2exprofiledb ${DBNAME} ${DATA_DIR}/${DBNAME} --gpu 1
mmseqs createindex ${DATA_DIR}/${DBNAME} ${DATA_DIR}/tmp
# # build GPU padded database
mmseqs makepaddedseqdb ${DATA_DIR}/${DBNAME} ${DATA_DIR}/${DBNAME}_gpu
# # build GPU database index
mmseqs createindex ${DATA_DIR}/${DBNAME}_gpu ${DATA_DIR}/tmp --index-subset 2 --split 1
@yanj14jy15 I think mmseq2 version 15 did not support GPU, how can you test it with GPU ?
The following MMseqs2 build should fix the issue for both CPU and GPU: https://mmseqs.com/archive/8783404eab75833dcb865153ed2e146431649efa
You can download the precompiled binary above (likely the GPU-enabled Linux binary: https://mmseqs.com/archive/8783404eab75833dcb865153ed2e146431649efa/mmseqs-linux-gpu.tar.gz ) and pass the mmseqs binary contained within to colabfold_search --mmseqs path-to-binary.
Please let me know if this works.
Thank for your work, hope this version works. Besides, I wonder why do not run makepaddedseqdb on uniref30 and colabfold_envdb_202108 database in setup_databases.sh if I set GPU.
tsv2exprofiledb in setup_databases.sh calls makepaddedseqdb
@milot-mirdita i tried https://mmseqs.com/archive/8783404eab75833dcb865153ed2e146431649efa , but still failed at 'expandaln'.
error:
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct null not valid
/home/yakun_li_genbio_ai/project/msa/bin/../bin/msa_retrieve_pipeline.bash: line 72: 4133157 Aborted
the process is as follows:
step1: build database
1.1 download env_db
1.2 tsv2exprofiledb and createindex
mmseqs tsv2exprofiledb ${DATA_BASE_DIR}/${DBNAME} \
${DATA_BASE_DIR}/${DBNAME}_db --gpu 1
mmseqs createindex ${DATA_BASE_DIR}/${DBNAME}_db ${DATA_BASE_DIR}/tmp \
--remove-tmp-files 1 --split 1 --index-subset 2
step2: search
mmseqs createdb ${input_fasta_path} ${result_dir}/qdb
mmseqs search ${result_dir}/qdb \
${database_path}/${DB_NAME} \
${result_dir}/res_gpu \
${result_dir}/tmp \
--num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 \
--gpu 1 --prefilter-mode 1 #--gpu-server 1 #--threads 64
mmseqs expandaln ${result_dir}/qdb \
${database_path}/${DB_NAME}.idx \
${result_dir}/res_gpu \
${database_path}/${DB_NAME}.idx \
${result_dir}/res_exp \
--db-load-mode 0 --threads 64 --expansion-mode 0 -e inf \
--expand-filter-clusters 0 --max-seq-id 0.95
@camel2000 Please upload the terminal output of the executed commands too
PROJECT_DIR:~/project/msa/bin/..
use GPU: 1
createdb ~/project/msa/examples/one_query.fasta ~/project/msa/examples/demo/qdb
Converting sequences
Time for merging to qdb_h: 0h 0m 0s 496ms
Time for merging to qdb: 0h 0m 0s 474ms
Database type: Aminoacid
Time for processing: 0h 0m 1s 808ms
Create directory ~/project/msa/examples/demo/tmp
search ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db ~/project/msa/examples/demo/res_gpu ~/project/msa/examples/demo/tmp --num-iterations 3 --db-load-mode 2 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1
ungappedprefilter ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 2 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 208 --compressed 0 -v 3
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
[=================================================================] 100.00% 1 eta -
Time for merging to pref_0: 0h 0m 0s 20ms
Time for processing: 0h 0m 10s 391ms
align ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_0 ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 1 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 208 --compressed 0 -v 3
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
Compute score only
Query database size: 1 type: Aminoacid
Target database size: 209335862 type: Aminoacid
Calculation of alignments
[=================================================================] 100.00% 1 eta -
Time for merging to aln_0: 0h 0m 0s 20ms
10000 alignments calculated
220 sequence pairs passed the thresholds (0.022000 of overall calculated)
220.000000 hits per query sequence
Time for processing: 0h 0m 0s 423ms
result2profile ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -e 0.1 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --comp-bias-corr-scale 1 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --db-load-mode 2 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --threads 208 --compressed 0 -v 3 --profile-output-mode 0
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
Query database size: 1 type: Aminoacid
Target database size: 209335862 type: Aminoacid
[=================================================================] 100.00% 1 eta -
Time for merging to profile_0: 0h 0m 0s 22ms
Time for processing: 0h 0m 0s 134ms
ungappedprefilter ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 2 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 208 --compressed 0 -v 3
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
[=================================================================] 100.00% 1 eta -
Time for merging to pref_tmp_1: 0h 0m 0s 24ms
Time for processing: 0h 0m 10s 331ms
subtractdbs ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ~/project/msa/examples/demo/tmp/502615879695006624/pref_1 --threads 208 --e-profile 0.1 -e 0.1 --compressed 0 -v 3
subtractdbs ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ~/project/msa/examples/demo/tmp/502615879695006624/pref_1 --threads 208 --e-profile 0.1 -e 0.1 --compressed 0 -v 3
Remove ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ids from ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1
[=================================================================] 100.00% 1 eta -
Time for merging to pref_1: 0h 0m 1s 907ms
Time for processing: 0h 0m 4s 55ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_1
Time for processing: 0h 0m 0s 12ms
align ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 208 --compressed 0 -v 3
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
Compute score, coverage and sequence identity
Query database size: 1 type: Profile
Target database size: 209335862 type: Aminoacid
Calculation of alignments
[=================================================================] 100.00% 1 eta -
Time for merging to aln_tmp_1: 0h 0m 0s 21ms
9780 alignments calculated
34 sequence pairs passed the thresholds (0.003476 of overall calculated)
34.000000 hits per query sequence
Time for processing: 0h 0m 0s 606ms
mergedbs ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_0 ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_1
Merging the results to ~/project/msa/examples/demo/tmp/502615879695006624/aln_1
[=================================================================] 100.00% 1 eta -
Time for merging to aln_1: 0h 0m 0s 29ms
Time for processing: 0h 0m 0s 79ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/aln_0
Time for processing: 0h 0m 0s 9ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_1
Time for processing: 0h 0m 0s 9ms
result2profile ~/project/msa/examples/demo/tmp/502615879695006624/profile_0 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/profile_1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -e 0.1 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --comp-bias-corr-scale 1 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --db-load-mode 2 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --threads 208 --compressed 0 -v 3 --profile-output-mode 0
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
Query database size: 1 type: Profile
Target database size: 209335862 type: Aminoacid
[=================================================================] 100.00% 1 eta -
Time for merging to profile_1: 0h 0m 0s 21ms
Time for processing: 0h 0m 0s 138ms
ungappedprefilter ~/project/msa/examples/demo/tmp/502615879695006624/profile_1 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 2 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 208 --compressed 0 -v 3
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
[=================================================================] 100.00% 1 eta -
Time for merging to pref_tmp_2: 0h 0m 0s 23ms
Time for processing: 0h 0m 10s 416ms
subtractdbs ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2 ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/pref_2 --threads 208 --e-profile 0.1 -e 0.1 --compressed 0 -v 3
subtractdbs ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2 ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/pref_2 --threads 208 --e-profile 0.1 -e 0.1 --compressed 0 -v 3
Remove ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ids from ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2
[=================================================================] 100.00% 1 eta -
Time for merging to pref_2: 0h 0m 1s 890ms
Time for processing: 0h 0m 3s 994ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/pref_tmp_2
Time for processing: 0h 0m 0s 11ms
align ~/project/msa/examples/demo/tmp/502615879695006624/profile_1 /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/tmp/502615879695006624/pref_2 ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 208 --compressed 0 -v 3
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
Compute score, coverage and sequence identity
Query database size: 1 type: Profile
Target database size: 209335862 type: Aminoacid
Calculation of alignments
[=================================================================] 100.00% 1 eta -
Time for merging to aln_tmp_2: 0h 0m 0s 20ms
9746 alignments calculated
94 sequence pairs passed the thresholds (0.009645 of overall calculated)
94.000000 hits per query sequence
Time for processing: 0h 0m 0s 592ms
mergedbs ~/project/msa/examples/demo/tmp/502615879695006624/profile_1 ~/project/msa/examples/demo/res_gpu ~/project/msa/examples/demo/tmp/502615879695006624/aln_1 ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_2
Merging the results to ~/project/msa/examples/demo/res_gpu
[=================================================================] 100.00% 1 eta -
Time for merging to res_gpu: 0h 0m 0s 23ms
Time for processing: 0h 0m 0s 75ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/aln_1
Time for processing: 0h 0m 0s 10ms
rmdb ~/project/msa/examples/demo/tmp/502615879695006624/aln_tmp_2
Time for processing: 0h 0m 0s 9ms
search cost: 44 s
expandaln ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/res_gpu /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/res_exp --db-load-mode 0 --threads 64 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
Index version: 16
Generated by: 8783404eab75833dcb865153ed2e146431649efa
ScoreMatrix: VTML80.out
[=================================================================] 100.00% 1 eta -
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string: construction from null is not valid
~/project/msa/bin/../bin/msa_retrieve_pipeline.bash: line 72: 4138815 Aborted mmseqs expandaln ${result_dir}/qdb ${database_path}/${DB_NAME}.idx ${result_dir}/res_gpu ${database_path}/${DB_NAME}.idx ${result_dir}/res_exp --db-load-mode 0 --threads 64 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95
@milot-mirdita this is the terminal output of the executed commands, thanks very much
@camel2000 would it be possible to upload or send me the query sequence?
@milot-mirdita thanks, this is the query fasta content
>tr|A7TBS3|A7TBS3_NEMVE Predicted protein (Fragment) OS=Nematostella vectensis GN=v1g153959 PE=4 SV=1 Split=0
VCIHTENQNQVSFYPFVLHEISVLIELTLGHLRYRLTDVPPQPNSQPDSATNYVWML
@camel2000 I cannot get that sequence to crash on my side.
I tried exactly the commands you listed above: https://github.com/sokrypton/ColabFold/issues/691#issuecomment-2896838973
Could you rerun the crashing expandaln command with unset MMSEQS_CALL_DEPTH before, so it prints more debug information?
unset MMSEQS_CALL_DEPTH
mmseqs expandaln ~/project/msa/examples/demo/qdb /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/res_gpu /mnt/localssd/yakunli/data/level-1/colabfold_envdb_202108/colabfold_envdb_202108_db.idx ~/project/msa/examples/demo/res_exp --db-load-mode 0 --threads 64 --expansion-mode 0 -e inf --expand-filter-clusters 0 --max-seq-id 0.95
Could you also pack qdb* and res_gpu* into a tar and send this to me/upload it here?
you known what,I succeeded on the uniref30_2302 dataset, but failed many many times on colabfold_envdb_202108 (GPU version) @milot-mirdita
I am trying to reproduce your issue on the colabfolddb, but I am failing at that. Please try the steps above, maybe these will help to diagnose what's wrong
i am trying to (i restart over again). when i have something new ,i will let you known @milot-mirdita thanks very much
@milot-mirdita i think i have solved all the bugs,the failure on the colabfolddb is because “there have no enough apace on /tmp ”, There is only one line of information indicating insufficient space, which is mixed among numerous logs and is difficult to be detected