MMseqs2 libmarv initialization error during GPU-accelerated search against ColabFold databases using Docker

Summary

When running GPU-accelerated search using the mmseqs2 docker image against the colabfold_envdb_202108 or uniref30_2302_db databases I see an error saying

CUDA error: initialization error : /opt/build/lib/libmarv/src/marv.cu, line 85
Error: Prefilter died

That particular line is involved in getting the CUDA device count, so maybe it has something to do with seeing the GPUs on the instanc?

Environment

AWS g5.8xlarge EC2 instance
- AMD EPYC 7R32 processor
- 32x vCPU
- 128 GiB Memory
- 1x NVIDIA A10G accelerator (24 GiB VRAM)
Deep Learning OSS Nvidia Driver AMI GPU PyTorch 2.5.1 (Amazon Linux 2023) 20250107
- 6.1.119-129.201.amzn2023.x86_64 Linux distro
- CUDA Version: 12.6
- CUDA Driver Version: 560.35.03
- PyTorch 2.5.1
Container ghcr.io/soedinglab/mmseqs2:17-b804f-cuda12
- Pulled Wednesday, January 22, 2025

Steps to reproduce

mkdir data
wget https://www.rcsb.org/fasta/entry/1UTN -O data/1utn.fasta
wget https://wwwuser.gwdg.de/~compbiol/colabfold/colabfold_envdb_202108.tar.gz
tar -xzvf colabfold_envdb_202108.tar.gz -C data
rm colabfold_envdb_202108.tar.gz
docker pull ghcr.io/soedinglab/mmseqs2:17-b804f-cuda12
docker run -it --rm --gpus all -v "$(pwd)/data:/home/data" ghcr.io/soedinglab/mmseqs2:17-b804f-cuda12 tsv2exprofiledb "/home/data/colabfold_envdb_202108" "/home/data/targetDB" --gpu 1
docker run -it --rm --gpus all -v "$(pwd)/data:/home/data" ghcr.io/soedinglab/mmseqs2:17-b804f-cuda12 createdb "/home/data/1utn.fasta" "/home/data/queryDB"
docker run -it --rm --gpus all -v "$(pwd)/data:/home/data" ghcr.io/soedinglab/mmseqs2:17-b804f-cuda12 search "/home/data/queryDB" "/home/data/targetDB" "/home/data/result" "/home/data/tmp" --num-iterations 3 --db-load-mode 2 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1

Things I've tried

✅ SUCCESS: Run search with --gpu 0 (i.e. turn off GPU acceleration) ⛔ ERROR: Set the CUDA_VISIBLE_DEVICES="0" environment variable ⛔ ERROR: Use the ghcr.io/soedinglab/mmseqs2:latest-cuda12 container ⛔ ERROR: Clone the GitHub repo and build the Dockerfile ⛔ ERROR: Modify the Dockerfile to COPY /opt/build/lib/libmarv from the build stage ⛔ ERROR: Modify the Dockerfile to use the precompiled binary ⛔ ERROR: Run using a g5.12xlarge (4x A10G GPUs instead of 1x)

Jan 23 '25 01:01 brianloyal

Does the precompiled binary work?

wget https://mmseqs.com/latest/mmseqs-linux-gpu.tar.gz

Maybe there is something wrong with the docker

Jan 23 '25 01:01 milot-mirdita

@milot-mirdita Yeah, tried that too and got the same result. For my own sanity, have you been able to successfully run GPU search against the colabfold profile databases?

Jan 23 '25 15:01 brianloyal

Does the following script work on the A10G instance:

wget https://mmseqs.com/latest/mmseqs-linux-gpu.tar.gz
tar xzvf mmseqs-linux-gpu.tar.gz
wget https://raw.githubusercontent.com/soedinglab/MMseqs2/refs/heads/master/examples/QUERY.fasta
./mmseqs/bin/mmseqs easy-search QUERY.fasta QUERY.fasta res tmp --gpu 1

If not, can you try this script on an L4 or L40s based instance? We did most of our testing on those.

Is CUDA_VISIBLE_DEVICES set to some odd value?

Jan 23 '25 15:01 milot-mirdita

Update: The script you shared generates the same error that I saw before, both from within the ghcr.io/soedinglab/mmseqs2:17-b804f-cuda12 container and directly on the host

docker_stdout.txt

I'll try it on a L4 and L40S here in a bit and report back

Jan 23 '25 19:01 brianloyal

Good news! The script works on both the L4 and L4S, both directly on the host and from the container. Maybe there's an issue running on Ampere?

My original script works as well

Jan 23 '25 20:01 brianloyal

Closing this for now, but you may want to update the wiki to strongly encourage Lovelace-gen GPUs for best results

Jan 23 '25 22:01 brianloyal

Reopening as we need to keep investigating. MMseqs2-GPU should work even on Turing (albeit rather slowly). Ampere and newer should all work fine.

Jan 24 '25 04:01 milot-mirdita

I tried on 2080 TI (Turing, cuda cap 7.5), A5000 (Ada, 8.6), 4090 (Ada Lovelace, 8.9) and L40S (Ada Lovelace, 8.9). Works everywhere. Did you also try on A100? Does this only happen on A10G? Would it be possible to give me temporary access to an A10G machine?

Jan 24 '25 08:01 milot-mirdita

Ok, I just tried on an A100 and it also works fine, both inside and outside the container. So, at least so far, it seems specific to an A10G.

It looks like the list of cuda architectures passed to the compiler in the Docker container includes the one for A10G (8.6), so that's not the problem. Can't think of anything else that would be A10-specific, but we're just about at the edge of my depth on cuda.

Jan 24 '25 13:01 brianloyal

One more update. I switched over to a Ubuntu AMI running CUDA 12.4 on a A10G and it worked. I've seen some reports in the past of CUDA version inconsistencies when compiling on a A100 vs A10G using Amazon Linux so it's probably something to do with that. However, since (A) it works on a g5 running Ubuntu, and (B) it works on a g6 and p4 running Amazon Linux, I'm satisfied. No edits necessary to the wiki.

Jan 24 '25 14:01 brianloyal

Thanks! Can you please share the Driver version used in the last run with CUDA 12.4? If I understood correctly the issues are coming from Driver version 560.35.03 + CUDA 12.6?

Jan 24 '25 14:01 achacond

Yep, that's right. The one that worked on A10G was 550.144.03 + CUDA 12.4

Jan 24 '25 14:01 brianloyal

@brianloyal would you mind sharing the AMI (I didn't find it on AWS cuda 12.4) and how you started the docker ultimately?

Mar 07 '25 19:03 agenore

@brianloyal would you mind sharing the AMI (I didn't find it on AWS cuda 12.4) and how you started the docker ultimately?

Sure thing - this was the winner https://aws.amazon.com/releasenotes/aws-deep-learning-ami-gpu-pytorch-2-5-ubuntu-22-04/

Mar 07 '25 20:03 brianloyal

thank you @brianloyal !

when deploying with that 12.4 AMI, and the docker you used ( ghcr.io/soedinglab/mmseqs2:17-b804f-cuda12 ), it returns an error that that version requires cuda 12.6

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI run
time create failed: runc create failed: unable to start container process: error during container init: erro
r running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.6, please update your driver to a n
ewer version, or use an earlier cuda container: unknown

did you just NVIDIA_DISABLE_REQUIRE=true and it still worked? or used other mmseqs/ami versions?
did you end up using a g6.8xlarge?

Mar 08 '25 17:03 agenore

The same problem for me on a Nvidia H100 (535.230.02 + CUDA 12.2), when using colabfold_search. The MMseqs2 version is a2815df9a6c6da173589fb65b3f71639ea08336d.

Mar 18 '25 05:03 Little-Ghost

What error are you getting @Little-Ghost ? The same docker init error as the post above you? That doesn't look like an issue with MMseqs2-GPU, epecially since MMseqs2-GPU doesn't require CUDA to be installed at all, only the nvidia-driver. If you are using docker, could you try if it works without docker?

Mar 18 '25 07:03 milot-mirdita

What error are you getting @Little-Ghost ? The same docker init error as the post above you? That doesn't look like an issue with MMseqs2-GPU, epecially since MMseqs2-GPU doesn't require CUDA to be installed at all, only the nvidia-driver. If you are using docker, could you try if it works without docker?

I was using the colabfold_search with gpu mode. The code is like colabfold_search --gpu 1 batch_test.fasta ~/ColabFold/dbs msas or colabfold_search --gpu 1 --gpu-server 1 batch_test.fasta ~/ColabFold/dbs msas When I submitted the script to a computing node with an Nvidia H100 gpu, it gave the following error; while if with an Nvidia A100 gpu, the program ran well, and produced normal a3m results.

The complete error message is:

Index version: 16 Generated by: a2815df9a6c6da173589fb65b3f71639ea08336d ScoreMatrix: VTML80.out CUDA error: invalid device ordinal : /home/MMseqs2/lib/libmarv/src/marv.cu, line 85 Error: Prefilter died Traceback (most recent call last): File "/home/miniconda3/envs/colabfold/bin/colabfold_search", line 8, in sys.exit(main()) File "/home/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/mmseqs/search.py", line 450, in main mmseqs_search_monomer( File "/home/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/mmseqs/search.py", line 126, in mmseqs_search_monomer run_mmseqs(mmseqs, ["search", base.joinpath("qdb"), dbbase.joinpath(uniref_db), base.joinpath("res"), base.joinpath("tmp"), "--threads", str(threads)] + search_param) File "/home/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/mmseqs/search.py", line 45, in run_mmseqs subprocess.check_call([mmseqs] + params) File "/home/miniconda3/envs/colabfold/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '[PosixPath('mmseqs'), 'search', PosixPath('msas/qdb'), PosixPath('/home/ColabFold/dbs/uniref30_2302_db'), PosixPath('msas/res'), PosixPath('msas/tmp'), '--threads', '64', '--num-iterations', '3', '--db-load-mode', '0', '-a', '-e', '0.1', '--max-seqs', '10000', '--gpu', '1', '--prefilter-mode', '1']' returned non-zero exit status 1.

The gpu-server mode did not work on Nvidia A100 gpu as well. However it raised another error:

INFO:colabfold.mmseqs.search:Running mmseqs createdb msas_batch_test_gpuserver/query.fas msas_batch_test_gpuserver/qdb --shuffle 0 Traceback (most recent call last): File "/home/miniconda3/envs/colabfold/bin/colabfold_search", line 8, in sys.exit(main()) File "/home/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/mmseqs/search.py", line 432, in main run_mmseqs( File "/home/miniconda3/envs/colabfold/lib/python3.9/site-packages/colabfold/mmseqs/search.py", line 45, in run_mmseqs subprocess.check_call([mmseqs] + params) File "/home/miniconda3/envs/colabfold/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '[PosixPath('mmseqs'), 'createdb', PosixPath('msas_batch_test_gpuserver/query.fas'), PosixPath('msas_batch_test_gpuserver/qdb'), '--shuffle', '0']' died with <Signals.SIGILL: 4>. /var/spool/slurm/d/job21778/slurm_script: line 31: 31416 Illegal instruction (core dumped) mmseqs gpuserver ~/ColabFold/dbs/colabfold_envdb_202108_db --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1 /var/spool/slurm/d/job21778/slurm_script: line 31: 31417 Illegal instruction (core dumped) mmseqs gpuserver ~/ColabFold/dbs/uniref30_2302_db --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1

Mar 20 '25 12:03 Little-Ghost

@Little-Ghost Please make a new issue. The first issue might be because of some invalid CUDA_VISIBLE_DEVICES setting? Could you post the nvidia-smi output and also echo $CUDA_VISIBLE_DEVICES.

Regarding the second issue. What CPU does the A100 system have and how did you install/compile MMseqs2? It looks like the CPU you are using doesn't have AVX2 support, but that would imply you have a very old CPU.

Please respond in a NEW issue, as both these issues look to be unrelated to the rest of this issue.

Mar 20 '25 14:03 milot-mirdita

Here's another clue about this error

CUDA error: initialization error : /work/lib/libmarv/src/marv.cu, line 85 Error: Prefilter died

I got it with Nvidia 4090, Driver Version: 575.64.03, CUDA Version: 12.9 using command

colabfold_search 9hix_ab.fasta ~/msa_databases results13 --mmseqs mmseqs-gpu-release18 --gpu 1 --db-load-mode 2 >& output

with current Colabfold source code (August 29, 2025). But just before this run the exact same run did not get the error and completed successfully. I subsequently tried 3 times and every run gave the same error in a different mmseqs search command (there are 3 such commands in this 2 sequence test case). So the error is not at all reproducible, suggesting it is an Nvidia driver problem. This happened to me a few days ago and I rebooted the machine and the same colabfold_search runs then worked. nvidia-smi works and does not indicate any problem. When I run ChimeraX it uses the Nvidia driver for OpenGL with no problem. There is no CUDA_VISIBLE_DEVICES set and the machine has one 4090 GPU. Running Ubuntu 24.04 with ColabFold installed in a Python venv and` not using Docker. I will reboot now and report if this "fixes" the problem as it did the last time.

After a reboot the error no longer occurred using exactly the same command. Then I ran the same colabfold_search command only adding the --af3-json option and again it worked correctly. Then I ran the same command only adding the --unpack 0 option and it failed with the libmarv error on the very first mmseqs search (after 0.5 seconds).

One more observation. Instead of rebooting again I decided to clear the page and disk caches since it seems some persistent cuda state has become corrupted. I did this on Ubuntu 24 with

sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

Then the exact same run that failed worked again with no reboot.

Clearing just the page cache

$ sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'

is not sufficient to fix the problem. So it appears that somehow the disk cache is corrupted and the following should be sufficient to remedy it although I have not tried it since the error is hard to reproduce.

$ sudo sh -c 'echo 2 > /proc/sys/vm/drop_caches'

Had a chance to test this clearing of disk cache only (echo 2 ...) and it didn't fix the problem. But clearing both page and disk caches with the following did fix it.

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

Aug 30 '25 00:08 tomgoddard

Have the same issue with mmseqs-gpu, on Nvidia 6000 Ada, Driver Version 575.64.03, Ubuntu 24.04 (not Docker). I used the precompiled binary.

In my case it's a rare occurence and usually works again if I simply rerun the colab_search command. This means I can't reliably reproduce this issue.

... search jobs/myjob/msas/prof_res /media/data/colabfold/colabfold_envdb_202108_db jobs/myjob/msas/res_env jobs/myjob/tmp3 --threads 16 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1

ungappedprefilter jobs/myjob/msas/prof_res /media/data/colabfold/colabfold_envdb_202108_db jobs/myjob/msas/tmp3/493405938664241066/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 0 --gpu 1 --gpu-server 0 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 16 --compressed 0 -v 3

CUDA error: initialization error : /work/lib/libmarv/src/marv.cu, line 85 Error: Prefilter died

I modified the colab_search code to retry every mmseqs subcommand up to 3 times which largely reduces a crash due to this initialization error, but rarely it still happens. Maybe I have to try out the drop_caches trick from tomgoddard in case a subcommand fails through all 3 retries.

Sep 12 '25 12:09 firsc

I get this exact same error about 1 in 10 runs of colabfold_search with Nvidia 4090, driver 575.64.03, CUDA Version: 12.9, Ubuntu 24.04 and mmseqs binary release mmseqs-gpu-release18 from the MMseqs2 github. Usually after getting the error once I will continue to get the error every time unless I clear the page and disk caches with

$ sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

That clearing of the page and disk caches works most of the time, but about 1 in 10 times it fails to fix the error.

I have the impression that the error happens often if I run searches of the exact same sequences multiple times. I have been doing that often and changing mmseqs command options in the colabfold_search search.py code to test various parameters. One hunch about that is that the problem is from some uninitialized memory being used. When running the same sequences in the search the memory layout may be the same and end up with the same uninitialized value.

The error message happens in mmseqs code marv.cu code

https://github.com/soedinglab/MMseqs2/blob/2348eb6c754b4b7effb7f8471a9d19d3c0e917e5/lib/libmarv/src/marv.cu#L85

when calling cudaGetDeviceCount(). The CUDA documentation for cudaGetDeviceCount()

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g18808e54893cfcaafefeab31a73cc55f

mentions these possible errors: cudaErrorInitializationError, cudaErrorInsufficientDriver or cudaErrorNoDevice can happen if this call initializes CUDA runtime state. The error message suggests it is cudaErrorInitializationError that is happening.

Sep 12 '25 20:09 tomgoddard

Thanks for the feedback @tomgoddard Could you please confirm us if you are using Docker to run MMSeqs? Is this issue happening to you only when running Docker? If that is the case, could you please hare your docker command?

When running docker I use the following options: --ulimit memlock=-1 --ulimit stack=67108864 --privileged=true --gpus all -v /dev/shm:/dev/shm

The 2 first options should help on performance side, but I am wondering if we are running very different docker parameters.

Sep 13 '25 07:09 achacond

I am not running MMseqs2 under Docker. I use the distributed binary mmseqs-linux-gpu.tar.gz from the MMseqs2 github

https://github.com/soedinglab/MMseqs2/releases/download/18-8cc5c/mmseqs-linux-gpu.tar.gz

I also frequently run Boltz structure prediction using CUDA on this same desktop machine without problems. Still I suspect it may be a CUDA installation issue. How that could cause an intermittent failure is a mystery. Repeated runs from the same bash shell with no environment variables changing, sometimes running identical mmseqs commands via colabfold_search will sometimes fail. The colabfold_search runs more than a dozen mmseqs command and the failure is always in "mmseqs search", not surprising since I think that is the only command using CUDA. But a single colabfold_search runs 2 mmseqs search commands and in half the failures the first search fails, and in half the second search fails (executing just a minute or two after the first from the colabfold_search Python code).

Sep 13 '25 18:09 tomgoddard

I said in the previous comment that Boltz structure prediction using CUDA on the same machine works consistently. But I have a vague recollection that a few times it has failed and I simply reran the exact same prediction immediately after the failure and it worked. I'll make a note if I see this happen again. If that is right then very likely this problem is a CUDA issue not related to mmseqs.

Sep 13 '25 18:09 tomgoddard

Thanks for the feedback @tomgoddard

If I understand correctly your node configuration is: Nvidia 4090, Driver Version: 575.64.03, CUDA Version: 12.9, is this correct? Could you please make me a favor and try to reproduce the error but compiling MMseqs for your system instead the precompiled binary?

I would recommend compiling version 18 (https://github.com/soedinglab/MMseqs2/releases/tag/18-8cc5c) to match the same version that is failing to you.

Best, Alex

Sep 14 '25 07:09 achacond

I'll try compiling MMseqs release 18 for GPU on the machine where I get the errors and see if that helps. It may take a week because I am busy preparing a talk.

Sep 15 '25 18:09 tomgoddard

Hey @tomgoddard I was wondering if you had a chance to check this with more detail. Let me know if I could help you.

Sep 25 '25 15:09 achacond

I compiled MMseqs2 release 18 on my Ubuntu 24.04 system with Nvidia 4090, CUDA 12.9, driver version 575.64.03, using -DCMAKE_CUDA_ARCHITECTURES="native" and ran a colabfold_search test. It crashed with the libmarv cuda initialization error on the first try. On the second time, with no changes from the same shell, 2 minutes later, it ran successfully. Trying a third time again with no changes, run from same shell, it crashed again with the cuda initialization error.

I'll attach a zip archive giving the input query.fas and mmseqs command log text file output28 and a listing of the colabfold sequence databases I used for the first crash case.

results28.zip

Sep 25 '25 22:09 tomgoddard

Thanks this is a very useful feedback. I will give a try on our systems, if I can reproduce it successfully we will open a bug at Nvidia for further investigation.

@tomgoddard could you also share with us the exact database that you are using? Is the recommended colafbold DB?

Sep 26 '25 01:09 achacond