MMseqs2 icon indicating copy to clipboard operation
MMseqs2 copied to clipboard

CUDA error: invalid configuration argument : /work/lib/libmarv/src/pssm.cuh

Open saro2-a opened this issue 10 months ago • 2 comments

I'm running colabfold/mmseqs, but I get a cuda error.

Any advice on what could be wrong? I ran quite a few trials, but couldn't figure out

I installed via

RUN wget https://mmseqs.com/latest/mmseqs-linux-gpu.tar.gz && \
    tar xvfz mmseqs-linux-gpu.tar.gz

ENV PATH=/app/mmseqs/bin/:$PATH
Using pre-initialized databases from /workspace/db
S3 upload is not enabled, skipping upload
Starting in server mode...
Starting MMseqs2 GPU servers...
No specific CUDA devices set, using all available GPUs
Starting colabfold_envdb server...
Starting uniref30 server...
GPU servers are running. Use CUDA_VISIBLE_DEVICES to control GPU allocation.
Database location: /workspace/db
Server PIDs: 394, 395
Server mode started, now waiting for servers to close
gpuserver /workspace/db/uniref30_2302_db --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1
MMseqs Version:      	a2815df9a6c6da173589fb65b3f71639ea08336d
Use GPU              	0
Max results per query	10000
Preload mode         	0
Prefilter mode       	1
7971804618108345187
gpuserver /workspace/db/colabfold_envdb_202108_db --max-seqs 10000 --db-load-mode 0 --prefilter-mode 1
MMseqs Version:      	a2815df9a6c6da173589fb65b3f71639ea08336d
Use GPU              	0
Max results per query	10000
Preload mode         	0
Prefilter mode       	1
13217853096240131807
CUDA error: invalid configuration argument : /work/lib/libmarv/src/pssm.cuh, line 346
(colabfold_env) root@4ed5491a9855:/app# echo ">seq1\nMKLPVREQVITVQQRGTVYQPPQRDYVLLVSENESSEITQELTVKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLE" > input_sequences.fasta && colabfold_search --mmseqs $(which mmseqs) --gpu 1 --gpu-server 1 input_sequences.fasta ${DB_DIR} msas
INFO:colabfold.mmseqs.search:Running /app/mmseqs/bin/mmseqs createdb msas/query.fas msas/qdb --shuffle 0
createdb msas/query.fas msas/qdb --shuffle 0 

Converting sequences

Time for merging to qdb_h: 0h 0m 0s 0ms
Time for merging to qdb: 0h 0m 0s 0ms
Database type: Aminoacid
Time for processing: 0h 0m 0s 1ms
INFO:colabfold.mmseqs.search:Running /app/mmseqs/bin/mmseqs search msas/qdb /workspace/db/uniref30_2302_db msas/res msas/tmp --threads 64 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 --gpu-server 1
Create directory msas/tmp
search msas/qdb /workspace/db/uniref30_2302_db msas/res msas/tmp --threads 64 --num-iterations 3 --db-load-mode 0 -a -e 0.1 --max-seqs 10000 --gpu 1 --prefilter-mode 1 --gpu-server 1 

ungappedprefilter msas/qdb /workspace/db/uniref30_2302_db.idx msas/tmp/16599252575445546166/pref_0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -c 0 -e 0.1 --cov-mode 0 --comp-bias-corr 1 --comp-bias-corr-scale 1 --min-ungapped-score 15 --max-seqs 10000 --db-load-mode 0 --gpu 1 --gpu-server 1 --gpu-server-wait-timeout 600 --prefilter-mode 1 --threads 64 --compressed 0 -v 3 

Index version: 16
Generated by:  17.b804f
ScoreMatrix:  VTML80.out

Nvidia SMI

(colabfold_env) root@4ed5491a9855:/app# nvidia-smi 
Mon Feb 24 17:12:16 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:81:00.0 Off |                    0 |
| N/A   31C    P0             62W /  300W |   39821MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

NVCC


(colabfold_env) root@4ed5491a9855:/app# nvcc --version 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

saro2-a avatar Feb 24 '25 17:02 saro2-a

How did you set up the databases? That's the most likely thing that could have gone wrong given the error message

milot-mirdita avatar Feb 24 '25 17:02 milot-mirdita

Did you call it with the GPU=1 env var (as described in the readme)? Please use the latest mmseqs to create the database.

milot-mirdita avatar Mar 04 '25 03:03 milot-mirdita

Hi,

I got the exact same message and set up GPU=1 database before (@milot-mirdita). The thing is that it worked a few days ago.

Best, Rui

RuiWang1998 avatar Jun 26 '25 03:06 RuiWang1998

A bit of update:

Looks like this shares the same issue with https://github.com/NVIDIA/nccl/issues/1338. However, our frabric manager was working just fine. As a result, we performed a cold reboot and it is now fixed.

RuiWang1998 avatar Jun 26 '25 06:06 RuiWang1998