ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

Invalid database read error in colabfold_search

Open aaronkollasch opened this issue 3 years ago • 2 comments

Expected Behavior

Hello, I am trying to run batch searches against ColabFoldDB on a SLURM cluster, following the MSA instructions in the README.

Current Behavior

colabfold_search fails at the expandaln step with the error:

Invalid database read for database data file=[db_folder]/uniref30_2103_db.idx, database index=[db_folder]/uniref30_2103_db.idx.index
getData: local id (4294967295) >= db size (22)

Full log file: colabfold_search_output.txt

Steps to Reproduce (for bugs)

  1. bash setup_databases.sh [db_folder] Note: mmseqs createindex was run with --split-memory-limit 128G as mmseqs doesn't detect the SLURM job's memory limit otherwise.
  2. colabfold_search --db-load-mode 0 --mmseqs mmseqs_5185d3c/bin/mmseqs batch_1/input_sequences.fa [db_folder] batch_1/result_s8 Input sequences: input_sequences.fa

It looks like colabfold_search uses --split-memory-limit 0 in the prefilter steps and possibly later steps – I don't think this caused the issue as the job only reached 53 GB usage before it errored, but it would be nice to be able to set this to prevent the job from being killed.

Context

I'm looking to perform a batch search and the cluster jobs have a 250GiB limit, so I'm using --db-load-mode 0, but let me know if that isn't the best option.

Your Environment

  • Git commit: 2a47c6f1459fbbdb5242cbc62173f9b513813cfa
  • mmseqs commit: 5185d3cbb7af8a3122e202d47ddaaa785dc73890
  • Server: Intel Xeon CPU with AVX, 256GiB memory (jobs limited to 250GiB, and lower limits can mean faster submission times)
  • Operating system and version: CentOS 7

@thomashopf

aaronkollasch avatar Jul 27 '22 14:07 aaronkollasch

I recreated the index on a different machine without --split-memory-limit 128G and this error went away. Perhaps it was a one-off corruption of the index, an issue when specifying --split-memory-limit, or something specific to the cluster.

aaronkollasch avatar Aug 08 '22 19:08 aaronkollasch