ColabFold Running colabfold_search returns "terminate called recursively" and terminates

Hi, I have tried running colabfold_search on a gpu device (specifically, 3 A100-80G gpus). I did the setup_databases as instructed: GPU=1 ./setup_databases.sh path/to/db. The setup was successful.

However, when I tried using it, I can't seem to get it to run to completion successfully. I have attached the log file from running it.

colabfold_github_issue.txt

Here is the code I am using:

# start a gpu server to reduce the latency when performing msa searches
mmseqs gpuserver /user/ColabFold-official-GPU/colabfold_envdb_202108_db --max-seqs 10000 --db-load-mode 2 --prefilter-mode 1 &
PID1=$!

mmseqs gpuserver /user/ColabFold-official-GPU/uniref30_2302_db --max-seqs 10000 --db-load-mode 2 --prefilter-mode 1 &
PID2=$!

time colabfold_search --mmseqs mmseqs inputs/ /user/ColabFold-official-GPU outputs/ --gpu 1 --gpu-server 1 --threads ${N_CPUS} --db-load-mode 2

# Stop the server(s) when done:
kill $PID1
kill $PID2

Also, I can't seem to see the kX speedup that the GPU option offers. The log file above took 2 hours for 100s of sequences (all in .fasta format) even though it did not complete. I have tried both db-load-mode 0 and 2.

Kindly advise on how to resolve this. Thanks

Jun 21 '25 10:06 eborobert

I think I have already fixed this crash in git mmseqs. Please try downloading precompiled binaries from: https://mmseqs.com/latest/

And pass the path to the mmseqs binary to colabfold_search via it's --mmseqs parameter.

If the newer binary still happens to crash, would it be possible to either upload the FASTA files with the queries here (or otherwise send it to me) so I can try to reproduce the issue?

The speed issue looks like it's primarily from the databases being load from slow disks. If your system has sufficient RAM, you can try to pass --db-load-mode 0 to the mmseqs gpuserver calls (but not to colabfold_search! this should still be run with mode 2).

Jun 21 '25 10:06 milot-mirdita

If you doing a single run with many (hundreds or thousands or more) queries, I you could instead not start the gpuservers at all, and instead let colabfold_search deal with loading the databases and omit the -db-load-mode to it.

The gpuserver allows to run many fast single queries in succession.

Jun 21 '25 11:06 milot-mirdita

Thanks for the clarification and suggestions.

I am now running colabfold_search with the precompiled binary mmseqs-linux-gpu.tar.gz with db-load-mode omitted. I am also not starting the gpuserver. I will let you know of the outcome.

Thanks again.

Jun 21 '25 11:06 eborobert

Also, have there been reported cases where the MSAs from the colabfold msa server is substantially different from the ones a user computes on their local machines? If so, does this have any implications on downstream tasks like Boltz structure predictions?

Jun 21 '25 11:06 eborobert

The gpu implementation is a different algorithm, so different results are expected. From our benchmarking, just in regards in search sensitivity the gpu algorithm should be a bit sensitive.

For structure prediction it should be about the same.

Jun 21 '25 14:06 milot-mirdita

I just completed the running with the precompiled binary with db-load-mode not set. I have attached the log file.

colabfold_github_issue_v2.txt

The fallouts are:

It took 4.5hrs to complete the run.
Even then, you will find more than >1M lines saying Missing alignments for sequence k in the attached log file.
The folder of the queries had 147 fasta files. It only gave final outputs (.a3m format) for only 99 files.
The GPU consumption is very low – almost 0% . I had 3 NVIDIA A100-SXM4-80GB
Seems most of it was done on the CPU – it used 50GB of requested 80GB CPU memory. I had 40 CPUS implying 40 threads.
Is it also important to re-setup the gpu database using the precompiled binary used above in running colabfold_search ?

NB: I am running this on my institutions HPC cluster using SLURM scheduler.

Hopefully, I can get some assistance on getting this to work. Thanks.

Jun 21 '25 16:06 eborobert

You shouldn't need to rebuild the databases with the new binary, however, I think your colabfold_envdb_202108_db is in some way broken/incomplete. This is likely causing issue 2).

From your log, its spending the vast majority of runtime on the cluster expansion step (expandaln). The GPU part finishes in seconds/minutes. I am not sure why its so extremely slow. How much system/CPU RAM does the compute node have?

Ideally, I would like to try to reproduce the speed issue locally. Would it be possible to send me this FASTA input file? Per email would also be fine, if you don't want to upload it here.

Jun 22 '25 07:06 milot-mirdita

Hi, here's the fasta file.

proteins.zip

I realized that both the CPU and GPU version returned the MSAs for only 99/147 queries – meaning it there may be an issue with the database setup as you hinted earlier.

If possible, can you please the run queries for both CPU and GPU versions? Thanks for the assistance.

Jun 22 '25 10:06 eborobert

Please forgive my ignorance. Is there a way to run mmseqs with only one fasta file containing a of proteins, but that each of them is an independent protein and not that all of them form a multimer?

The log files I attached earlier were obtained by running colabfold_search against an input directory containing 147 separate files. Now, when I merge all of them into 1 giant file, I get this this warning WARNING:colabfold.input:More than one sequence in /tmp/8296506/inputs/proteins.fasta, ignoring all but the first sequence
I am thinking that if the number of proteins grow large enough (in the millions) then having only few files with each containing a chunk of proteins will be the better way to do it and not store >1M small-sized files. On shared resources like in an HPC environment, one may reach the maximum number of files limits quicker although the disk memory quota itself will have very low utilization. The max(f#files_limit, mem_quota) is used to kick users off the shared resource (i.e. users can't use the resource until they resolve which of the 2 options has been maxed out).

I also think this is important since the eventual ouputs (which I believe we don't have control over) will contain >1M a.3fm MSA files, in addition to >1M query.fasta files. So addressing this will lead to some substantial gains.

Kindly share some thoughts on this.

EDIT.... [RESOLUTION]: The csv file option handles this scenario. Thanks.

Jun 22 '25 12:06 eborobert

The CPU database was set up using MMSEQS_NO_INDEX=1.

Jun 22 '25 14:06 eborobert

Have you test colabfold_search gpu version benchmark, or how it should be fast ? I meet colabfold_search speed issue too, I query a fasta about 10 min using colabfold_search gpu server

Can you clarify a bit please. Thanks.

Jul 01 '25 09:07 eborobert

@yank666 Okay, thanks for the information. Can you share the number of sequences you queried and some stats like the min, max and average sequence lengths?

Thanks.

Jul 02 '25 10:07 eborobert

I have re-setup my GPU database using the precompiled binary available at mmseqs-linux-gpu.tar.gz. This time around, I was lucky to get a 1TB machine on my instution's HPC and decided to cache the *.idx files into the RAM. I was able to query MSAs for about 147 sequences in 41 minutes. The one where I don't load the *.idx files into RAM is still running (been >7hrs now).

Also, the GPU utilization was very low (<1%) meaning most of the search happens on the CPU so I am still wondering what the real contribution of the GPU is.
I guess my best bet is to depend on huge RAM machines to query sequences in reasonable time.

Jul 03 '25 17:07 eborobert