Database . does not exist
Hi, I'm encountering an issue while trying to run colabfold_search. I would appreciate any suggestions you might have for resolving it. Thank you in advance!
“colabfold_search test.fasta ./colabfold_db/ test --af3-json --use-templates 1 INFO:colabfold.mmseqs.search:Running mmseqs createdb test/query.fas test/qdb --shuffle 0 createdb test/query.fas test/qdb --shuffle 0
Converting sequences
Time for merging to qdb_h: 0h 0m 0s 1ms
Time for merging to qdb: 0h 0m 0s 0ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 6ms
Traceback (most recent call last):
File "/data/igem/dxxu/ColabFold_MMseqsGPU/localcolabfold/colabfold-conda/bin/colabfold_search", line 8, in
Could you post the output of ls ./colabfold_db/
Sure, the output of ls ./colabfold_db/ is ”COLABDB_READY colabfold_envdb_202108_aln.tsv colabfold_envdb_202108_db colabfold_envdb_202108_db_aln colabfold_envdb_202108_db_aln.dbtype colabfold_envdb_202108_db_aln.index colabfold_envdb_202108_db.dbtype colabfold_envdb_202108_db.GPU_READY colabfold_envdb_202108_db_h colabfold_envdb_202108_db_h.dbtype colabfold_envdb_202108_db_h.index colabfold_envdb_202108_db.idx colabfold_envdb_202108_db.idx.dbtype colabfold_envdb_202108_db.idx.index colabfold_envdb_202108_db.index colabfold_envdb_202108_db.lookup colabfold_envdb_202108_db_seq colabfold_envdb_202108_db_seq.dbtype colabfold_envdb_202108_db_seq_h colabfold_envdb_202108_db_seq_h.dbtype colabfold_envdb_202108_db_seq_h.index colabfold_envdb_202108_db_seq.index colabfold_envdb_202108_h.tsv colabfold_envdb_202108_seq.tsv colabfold_envdb_202108.tar.gz colabfold_envdb_202108.tsv DOWNLOADS_READY pdb pdb100_230517 pdb100_230517.dbtype pdb100_230517.fasta.gz pdb100_230517_h pdb100_230517_h.dbtype pdb100_230517_h.index pdb100_230517.idx pdb100_230517.idx.dbtype pdb100_230517.idx.index pdb100_230517.index pdb100_230517.lookup pdb100_230517_tmp_h pdb100_230517_tmp_h.dbtype pdb100_230517_tmp_h.index pdb100_a3m.ffdata pdb100_a3m.ffindex pdb100_foldseek_230517.tar.gz PDB100_READY PDB_MMCIF_READY PDB_READY tmp1 tmp2 tmp3 uniref30_2302_aln.tsv uniref30_2302_db uniref30_2302_db_aln uniref30_2302_db_aln.dbtype uniref30_2302_db_aln.index uniref30_2302_db.dbtype uniref30_2302_db.GPU_READY uniref30_2302_db_h uniref30_2302_db_h.dbtype uniref30_2302_db_h.index uniref30_2302_db.idx uniref30_2302_db.idx.dbtype uniref30_2302_db.idx.index uniref30_2302_db.idx_mapping uniref30_2302_db.idx_taxonomy uniref30_2302_db.index uniref30_2302_db.lookup uniref30_2302_db_mapping uniref30_2302_db_seq uniref30_2302_db_seq.dbtype uniref30_2302_db_seq_h uniref30_2302_db_seq_h.dbtype uniref30_2302_db_seq_h.index uniref30_2302_db_seq.index uniref30_2302_db_taxonomy uniref30_2302_h.tsv uniref30_2302.md5sum uniref30_2302_seq.tsv uniref30_2302.tar.gz uniref30_2302.tsv UNIREF30_READY “.
That is a very confusing error message but I think I understand whats going on.
We don't set a default path for the template database apperently. You need to call colabfold_search ... --db2 pdb100_230517. db1 and db3 have default database names (uniref30_2302_db and colabfold_envdb_202108_db respectively)
"prefilter mark_test/test/prof_res colabfold_db_GPU/pdb100_230517.idx mark_test/test/tmp2/2421078915233214753/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --seed-sub-mat 'aa:VTML80.out,nucl:nucleotide.out' -k 0 --target-search-mode 0 --k-score seq:2147483647,prof:2147483647 --alph-size aa:21,nucl:5 --max-seq-len 65535 --max-seqs 300 --split 0 --split-mode 2 --split-memory-limit 0 -c 0 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --diag-score 1 --exact-kmer-matching 0 --mask 1 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --min-ungapped-score 15 --add-self-matches 0 --spaced-kmer-mode 1 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --threads 64 --compressed 0 -v 3 -s 7.5
Index version: 16
Generated by: 17.b804f
ScoreMatrix: VTML80.out
Query database size: 1 type: Profile
Estimated memory consumption: 1G
Target database size: 329605 type: Aminoacid
Invalid database read for database data file=colabfold_db_GPU/pdb100_230517.idx, database index=colabfold_db_GPU/pdb100_230517.idx.index
getData: local id (4294967295) >= db size (17)
Error: Prefilter died
Traceback (most recent call last):
File "/share/home/mark/software/ColabFold/localcolabfold/colabfold-conda/bin/colabfold_search", line 8, in
Hi, thank you for your response. I followed your suggestions and adjusted the parameters, but I'm still encountering errors. However, the error message seems to have changed. I used the following command: "colabfold_search ./test.fasta ./colabfold_db_GPU/ ./mark_test/test --gpu 1 --af3-json --use-templates 1 --db2 pdb100_230517"
I just pushed a fix (379d101a0a6378243405e962939d104e84d2085a) so the template search also uses gpu correctly
Thank you very much for your response. The issue has been resolved after updating ColabFold. However, I noticed that when using the --use-templates and --af3-json parameters, the generated JSON file for AlphaFold3 still contains an empty template field, and only generates a separate .m8 file. This is different from the JSON file content produced by AlphaFold3's data pipeline. Is there a way to incorporate the template information directly into the JSON file?