'Invalid database read for database data' on running expandaln
The expandaln command fails to properly read index, producing an 'Invalid database read for database data' error
Expected Behavior
Command to run without error messages.
Current Behavior
Command fails instantly with following error message:
Invalid database read for database data file=db/human.idx, database index=db/human.idx.index
getData: local id (4294967295) >= db size (22)
Steps to Reproduce (for bugs)
mkdir db
mkdir job
mmseqs createdb uniprotkb_human.fasta db/human
mmseqs createindex db/human db/tmp --remove-tmp-files 1 --check-compatible 1
mmseqs createdb query.fasta job/qdb
mmseqs search job/qdb db/human job/res job/tmp1 --num-iterations 3 --db-load-mode 2 -a --k-score 'seq:96,prof:80' -e 0.1 --max-seqs 10000
mmseqs mvdb job/tmp1/latest/profile_1 job/prof_res
mmseqs lndb job/qdb_h job/prof_res_h
# Command which fails:
mmseqs expandaln job/qdb db/human.idx job/res db/human.idx job/res_exp --db-load-mode 1 --expansion-mode 0 -e inf --expand-filter-clusters 1 --max-seq-id 0.95
MMseqs Output (for bugs)
createdb:
MMseqs Version: 8799829d213f31b647fc69e0572a0c828c5aaf63
Database type 0
Shuffle input database true
Createdb mode 0
Write lookup file 1
Offset of numeric ids 0
Compressed 0
Verbosity 3
Converting sequences
[79690] 0s 233ms
Time for merging to human_h: 0h 0m 0s 24ms
Time for merging to human: 0h 0m 0s 53ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 472ms
createindex:
MMseqs Version: 8799829d213f31b647fc69e0572a0c828c5aaf63
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
k-mer length 0
Alphabet size aa:21,nucl:5
Compositional bias 1
Compositional bias 1
Max sequence length 65535
Max results per query 300
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Spaced k-mers 1
Spaced k-mer pattern
Sensitivity 7.5
k-score seq:0,prof:0
Check compatible 1
Search type 0
Split database 0
Split memory limit 0
Verbosity 3
Threads 4
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Compressed 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Strand selection 1
Remove temporary files true
createindex db/human db/tmp --remove-tmp-files 1 --check-compatible 1
MMseqs Version: 8799829d213f31b647fc69e0572a0c828c5aaf63
Seed substitution matrix aa:VTML80.out,nucl:nucleotide.out
k-mer length 0
Alphabet size aa:21,nucl:5
Compositional bias 1
Compositional bias 1
Max sequence length 65535
Max results per query 300
Mask residues 1
Mask residues probability 0.9
Mask lower case residues 0
Spaced k-mers 1
Spaced k-mer pattern
Sensitivity 7.5
k-score seq:0,prof:0
Check compatible 1
Search type 0
Split database 0
Split memory limit 0
Verbosity 3
Threads 4
Min codons in orf 30
Max codons in length 32734
Max orf gaps 2147483647
Contig start mode 2
Contig end mode 2
Orf start mode 1
Forward frames 1,2,3
Reverse frames 1,2,3
Translation table 1
Translate orf 0
Use all table starts false
Offset of numeric ids 0
Create lookup 0
Compressed 0
Add orf stop false
Overlap between sequences 0
Sequence split mode 1
Header split mode 0
Strand selection 1
Remove temporary files true
indexdb db/human db/human --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 1 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 4
Estimated memory consumption: 1G
Write VERSION (0)
Write META (1)
Write SCOREMATRIX3MER (4)
Write SCOREMATRIX2MER (3)
Write SCOREMATRIXNAME (2)
Write SPACEDPATTERN (23)
Write GENERATOR (22)
Write DBR1INDEX (5)
Write DBR1DATA (6)
Write HDR1INDEX (18)
Write HDR1DATA (19)
Index table: counting k-mers
[=================================================================] 100.00% 79.74K 2s 947ms
Index table: Masked residues: 1262029
Index table: fill
[=================================================================] 100.00% 79.74K 4s 125ms
Index statistics
Entries: 25991856
DB size: 637 MB
Avg k-mer size: 0.406123
Top 10 k-mers
VMEYLV 439
QRLRML 421
LYDMNY 403
TFDAFS 367
YRVLYR 257
VAESEW 236
TGYKLS 202
GEVLSS 200
VTSSSS 199
TFDAFT 194
Write ENTRIES (9)
Write ENTRIESOFFSETS (10)
Write SEQINDEXDATASIZE (15)
Write SEQINDEXSEQOFFSET (16)
Write SEQINDEXDATA (14)
Write ENTRIESNUM (12)
Write SEQCOUNT (13)
Time for merging to human.idx: 0h 0m 0s 0ms
Time for processing: 0h 0m 11s 156ms
expandaln:
expandaln job/qdb db/human.idx job/res db/human.idx job/res_exp --db-load-mode 1 --expansion-mode 0 -e inf --expand-filter-clusters 1 --max-seq-id 0.95
MMseqs Version: 8799829d213f31b647fc69e0572a0c828c5aaf63
Expansion mode 0
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Max sequence length 65535
Score bias 0
Compositional bias 1
Compositional bias 1
E-value threshold inf
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Pseudo count mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Expand filter clusters 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.95
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Preload mode 1
Compressed 0
Threads 4
Verbosity 3
Index version: 16
Generated by: 8799829d213f31b647fc69e0572a0c828c5aaf63
ScoreMatrix: VTML80.out
Index version: 16
Generated by: 8799829d213f31b647fc69e0572a0c828c5aaf63
ScoreMatrix: VTML80.out
Invalid database read for database data file=db/human.idx, database index=db/human.idx.index
getData: local id (4294967295) >= db size (22)
Context
I am attempting to recreate the functionality in https://github.com/soedinglab/MMseqs2-App/blob/master/backend/worker.go
Your Environment
Include as many relevant details about the environment you experienced the bug in.
- Git commit used - 8799829d213f31b647fc69e0572a0c828c5aaf63:
- Which MMseqs version was used - 8799829d213f31b647fc69e0572a0c828c5aaf63
- Fails in two environments:
- MacBook Pro, M1 (arm), 64GB memory
- Ubuntu server, 8GB memory
I also get the "Invalid database read for database data file" error from expandaln when called by colabfold_search.
(I originally posted this on Issue 64 before I realized that that Issue was closed.)
Invalid database read for database data file=/home/username/project/my_local_DB/target_DB.idx, database index=/home/username/project/my_local_DB/target_DB.idx.index
getData: local id (4294967295) >= db size (22)
I created target_DB from target.fasta which has 142 records in it:
pwd
# /home/username/project/my_local_DB
mmseqs createdb target.fasta target_DB
mmseqs createindex target_DB tmp_createindex --threads 96
indexdb target_DB target_DB --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --alph-size aa:21,nucl:5 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-seq-len 65535 --max-seqs 300 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --spaced-kmer-mode 1 -s 7.5 --k-score seq:0,prof:0 --check-compatible 0 --search-type 0 --split 0 --split-memory-limit 0 -v 3 --threads 96
Then I ran colabfold_search. Output is below.
CUDA_VISIBLE_DEVICES='0' colabfold_search
-s '8'
--db1 'target_DB'
--use-templates '0'
--db2 ''
--use-env '0'
--db3 ''
--filter '1'
--mmseqs 'mmseqs'
--expand-eval '1.7e+308'
--align-eval '10'
--diff '3000'
--qsc '-20.0'
--max-accept '1000000'
--db-load-mode '2'
--threads '96'
query.fasta
/home/username/project/my_local_DB
result_query_20230412_142303
createdb result_query_20230412_142303/query.fas result_query_20230412_142303/qdb --shuffle 0
search result_query_20230412_142303/qdb /home/username/project/my_local_DB/target_DB result_query_20230412_142303/res result_query_20230412_142303/tmp --threads 96 --num-iterations 3 --db-load-mode 2 -a -s 8 -e 0.1 --max-seqs 10000
prefilter result_query_20230412_142303/qdb /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/tmp/18292001434761310910/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 8 -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 10000 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 96 --compressed 0 -v 3
align result_query_20230412_142303/qdb /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/tmp/18292001434761310910/pref_0 result_query_20230412_142303/tmp/18292001434761310910/aln_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 1 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 96 --compressed 0 -v 3
result2profile result_query_20230412_142303/qdb /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/tmp/18292001434761310910/aln_0 result_query_20230412_142303/tmp/18292001434761310910/profile_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -e 0.1 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --comp-bias-corr-scale 1 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --db-load-mode 2 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --gap-pc 10 --threads 96 --compressed 0 -v 3
subtractdbs result_query_20230412_142303/tmp/18292001434761310910/pref_tmp_1 result_query_20230412_142303/tmp/18292001434761310910/aln_0 result_query_20230412_142303/tmp/18292001434761310910/pref_1 --threads 96 --e-profile 0.1 -e 0.1 --compressed 0 -v 3
subtractdbs result_query_20230412_142303/tmp/18292001434761310910/pref_tmp_1 result_query_20230412_142303/tmp/18292001434761310910/aln_0 result_query_20230412_142303/tmp/18292001434761310910/pref_1 --threads 96 --e-profile 0.1 -e 0.1 --compressed 0 -v 3
align result_query_20230412_142303/tmp/18292001434761310910/profile_0 /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/tmp/18292001434761310910/pref_1 result_query_20230412_142303/tmp/18292001434761310910/aln_tmp_1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 96 --compressed 0 -v 3
mergedbs result_query_20230412_142303/tmp/18292001434761310910/profile_0 result_query_20230412_142303/tmp/18292001434761310910/aln_1 result_query_20230412_142303/tmp/18292001434761310910/aln_0 result_query_20230412_142303/tmp/18292001434761310910/aln_tmp_1
rmdb result_query_20230412_142303/tmp/18292001434761310910/aln_0
rmdb result_query_20230412_142303/tmp/18292001434761310910/aln_tmp_1
result2profile result_query_20230412_142303/tmp/18292001434761310910/profile_0 /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/tmp/18292001434761310910/aln_1 result_query_20230412_142303/tmp/18292001434761310910/profile_1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -e 0.1 --mask-profile 1 --e-profile 0.1 --comp-bias-corr 1 --comp-bias-corr-scale 1 --wg 0 --allow-deletion 0 --filter-msa 1 --filter-min-enable 0 --max-seq-id 0.9 --qid '0.0' --qsc -20 --cov 0 --diff 1000 --pseudo-cnt-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --db-load-mode 2 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --gap-pc 10 --threads 96 --compressed 0 -v 3
prefilter result_query_20230412_142303/tmp/18292001434761310910/profile_1 /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/tmp/18292001434761310910/pref_tmp_2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -s 8 -k 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 10000 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 96 --compressed 0 -v 3
subtractdbs result_query_20230412_142303/tmp/18292001434761310910/pref_tmp_2 result_query_20230412_142303/tmp/18292001434761310910/aln_1 result_query_20230412_142303/tmp/18292001434761310910/pref_2 --threads 96 --e-profile 0.1 -e 0.1 --compressed 0 -v 3
subtractdbs result_query_20230412_142303/tmp/18292001434761310910/pref_tmp_2 result_query_20230412_142303/tmp/18292001434761310910/aln_1 result_query_20230412_142303/tmp/18292001434761310910/pref_2 --threads 96 --e-profile 0.1 -e 0.1 --compressed 0 -v 3
align result_query_20230412_142303/tmp/18292001434761310910/profile_1 /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/tmp/18292001434761310910/pref_2 result_query_20230412_142303/tmp/18292001434761310910/aln_tmp_2 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 1 --alignment-mode 2 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.1 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 2 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --threads 96 --compressed 0 -v 3
mergedbs result_query_20230412_142303/tmp/18292001434761310910/profile_1 result_query_20230412_142303/res result_query_20230412_142303/tmp/18292001434761310910/aln_1 result_query_20230412_142303/tmp/18292001434761310910/aln_tmp_2
rmdb result_query_20230412_142303/tmp/18292001434761310910/aln_1
rmdb result_query_20230412_142303/tmp/18292001434761310910/aln_tmp_2
expandaln result_query_20230412_142303/qdb /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/res /home/username/project/my_local_DB/target_DB.idx result_query_20230412_142303/res_exp --db-load-mode 2 --threads 96 --expansion-mode 0 -e 1.7976931348623157e+308 --expand-filter-clusters 1 --max-seq-id 0.95
MMseqs Version: 67949d702dbfc6e5d54fdd0f14a9ab6740f11c32
Expansion mode 0
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Max sequence length 65535
Score bias 0
Compositional bias 1
Compositional bias 1
E-value threshold 1.79769e+308
Seq. id. threshold 0
Coverage threshold 0
Coverage mode 0
Pseudo count mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Expand filter clusters 1
Use filter only at N seqs 0
Maximum seq. id. threshold 0.95
Minimum seq. id. 0.0
Minimum score per column -20
Minimum coverage 0
Select N most diverse seqs 1000
Preload mode 2
Compressed 0
Threads 96
Verbosity 3
Index version: 16
Generated by: 67949d702dbfc6e5d54fdd0f14a9ab6740f11c32
ScoreMatrix: VTML80.out
Index version: 16
Generated by: 67949d702dbfc6e5d54fdd0f14a9ab6740f11c32
ScoreMatrix: VTML80.out
Invalid database read for database data file=/home/username/project/my_local_DB/target_DB.idx, database index=/home/username/project/my_local_DB/target_DB.idx.index
getData: local id (4294967295) >= db size (22)
Traceback (most recent call last):
File "/home/username/project/colabfold_batch/colabfold-conda/bin/colabfold_search", line 8, in <module>
sys.exit(main())
File "/home/username/project/colabfold_batch/colabfold-conda/lib/python3.7/site-packages/colabfold/mmseqs/search.py", line 444, in main
threads=args.threads,
File "/home/username/project/colabfold_batch/colabfold-conda/lib/python3.7/site-packages/colabfold/mmseqs/search.py", line 86, in mmseqs_search_monomer
run_mmseqs(mmseqs, ["expandaln", base.joinpath("qdb"), dbbase.joinpath(f"{uniref_db}{dbSuffix1}"), base.joinpath("res"), dbbase.joinpath(f"{uniref_db}{dbSuffix2}"), base.joinpath("res_exp"), "--db-load-mode", str(db_load_mode), "--threads", str(threads)] + expand_param)
File "/home/username/project/colabfold_batch/colabfold-conda/lib/python3.7/site-packages/colabfold/mmseqs/search.py", line 23, in run_mmseqs
subprocess.check_call([mmseqs] + params)
File "/home/username/project/colabfold_batch/colabfold-conda/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '[PosixPath('mmseqs'), 'expandaln', PosixPath('result_query_20230412_142303/qdb'), PosixPath('/home/username/project/my_local_DB/target_DB.idx'), PosixPath('result_query_20230412_142303/res'), PosixPath('/home/username/project/my_local_DB/target_DB.idx'), PosixPath('result_query_20230412_142303/res_exp'), '--db-load-mode', '2', '--threads', '96', '--expansion-mode', '0', '-e', '1.7976931348623157e+308', '--expand-filter-clusters', '1', '--max-seq-id', '0.95']' returned non-zero exit status 1.
target_DB is a brand new database; I have not added nor deleted records after its creation.
I am working on Lambda server running Ubuntu:
Linux xyz-lambda02 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Please let me know if I can help with debugging.
Thank you. And thanks for mmseqs.
I got the same error but in different place, I ran local colabfold API Server, the error message is
Invalid database read for database data file=/data/colabFold/MsaServer/databases/uniref30_2202_db.idx, database index=/data/colabFold/MsaServer/databases/uniref30_2202_db.idx.index getData: local id (4294967295) >= db size (22)
Thanks