CUDA error: invalid configuration argument : /work/lib/libmarv/src/pssm.cuh
I'm running colabfold/mmseqs, but I get a cuda error.
Any advice on what could be wrong? I ran quite a few trials, but couldn't figure out
I installed via
RUN wget https://mmseqs.com/latest/mmseqs-linux-gpu.tar.gz && \
tar xvfz mmseqs-linux-gpu.tar.gz
ENV PATH=/app/mmseqs/bin/:$PATH
Using pre-initialized databases from /workspace/db
S3 upload is not enabled, skipping upload
Starting in server mode...
Starting MMseqs2 GPU servers...
No specific CUDA devices set, using all available GPUs
Starting colabfold_envdb server...
Starting uniref30 server...
GPU servers are running. Use CUDA_VISIBLE_DEVICES to control GPU allocation.
Database location: /workspace/db
Server PIDs: 394, 395
Server mode started, now waiting for servers to close
gpuserver /workspace/db/uniref30_2302_db --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1
MMseqs Version: a2815df9a6c6da173589fb65b3f71639ea08336d
Use GPU 0
Max results per query 10000
Preload mode 0
Prefilter mode 1
7971804618108345187
gpuserver /workspace/db/colabfold_envdb_202108_db --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1
MMseqs Version: a2815df9a6c6da173589fb65b3f71639ea08336d
Use GPU 0
Max results per query 10000
Preload mode 0
Prefilter mode 1
13217853096240131807
CUDA error: invalid configuration argument : /work/lib/libmarv/src/pssm.cuh, line 346
(colabfold_env) root@4ed5491a9855:/app# echo ">seq1\nMKLPVREQVITVQQRGTVYQPPQRDYVLLVSENESSEITQELTVKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLE" > input_sequences.fasta && colabfold_search --mmseqs $(which mmseqs) --gpu 1 --gpu-server 1 input_sequences.fasta ${DB_DIR} msas
INFO:colabfold.mmseqs.search:Running /app/mmseqs/bin/mmseqs createdb msas/query.fas msas/qdb --shuffle 0
createdb msas/query.fas msas/qdb --shuffle 0
Converting sequences
Time for merging to qdb_h: 0h 0m 0s 0ms
Time for merging to qdb: 0h 0m 0s 0ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 1ms
INFO:colabfold.mmseqs.search:Running /app/mmseqs/bin/mmseqs search msas/qdb /workspace/db/uniref30_2302_db msas/res msas/tmp --threads 64 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 --gpu-server 1
Create directory msas/tmp
search msas/qdb /workspace/db/uniref30_2302_db msas/res msas/tmp --threads 64 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 --gpu-server 1
ungappedprefilter msas/qdb /workspace/db/uniref30_2302_db.idx msas/tmp/16599252575445546166/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 0 --gpu 1 --gpu-server 1 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 64 --compressed 0 -v 3
Index version: 16
Generated by: 17.b804f
ScoreMatrix: VTML80.out
Nvidia SMI
(colabfold_env) root@4ed5491a9855:/app# nvidia-smi
Mon Feb 24 17:12:16 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:81:00.0 Off | 0 |
| N/A 31C P0 62W / 300W | 39821MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
NVCC
(colabfold_env) root@4ed5491a9855:/app# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
How did you set up the databases? That's the most likely thing that could have gone wrong given the error message
Did you call it with the GPU=1 env var (as described in the readme)? Please use the latest mmseqs to create the database.
Hi,
I got the exact same message and set up GPU=1 database before (@milot-mirdita). The thing is that it worked a few days ago.
Best, Rui
A bit of update:
Looks like this shares the same issue with https://github.com/NVIDIA/nccl/issues/1338. However, our frabric manager was working just fine. As a result, we performed a cold reboot and it is now fixed.