openfold
openfold copied to clipboard
CUDA error during inference
I installed openfold on a local server machine using recommanded installation steps
scripts/install_third_party_dependencies.sh
source scripts/deactivate_conda_env.sh
python3 setup.py install
scripts/install_hh_suite.sh
bash scripts/download_alphafold_dbs.sh data/
Then when using inference with
python3 run_pretrained_openfold.py \
../data/fastas/nrt14 \
data/flattened/ \
--uniref90_database_path data/uniref90/uniref90.fasta \
--mgnify_database_path data/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path data/pdb70/pdb70 \
--uniclust30_database_path data/uniclust30/uniclust30_2018_08/uniclust30_2018_08 \
--output_dir ../results/pdbs_openfold_predicted/ \
--bfd_database_path data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
--hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
--hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
--kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \
--config_preset "model_1_ptm" \
--jax_param_path openfold/resources/params/params_model_1.npz \
--model_device "cuda:0"
I get the following error :
INFO:/media/honeypot/baldwin/openfold/openfold/utils/script_utils.py:Successfully loaded JAX parameters at openfold/resources/params/params_model_1.npz...
INFO:/media/honeypot/baldwin/openfold/run_pretrained_openfold.py:Using precomputed alignments for nrt14 at ../results/pdbs_openfold_predicted/alignments...
INFO:/media/honeypot/baldwin/openfold/openfold/utils/script_utils.py:Running inference for nrt14...
Traceback (most recent call last):
File "/media/honeypot/baldwin/openfold/run_pretrained_openfold.py", line 401, in <module>
main(args)
File "/media/honeypot/baldwin/openfold/run_pretrained_openfold.py", line 254, in main
out = run_model(model, processed_feature_dict, tag, args.output_dir)
File "/media/honeypot/baldwin/openfold/openfold/utils/script_utils.py", line 159, in run_model
out = model(batch)
File "/media/honeypot/baldwin/openfold/lib/conda/envs/openfold_venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/media/honeypot/baldwin/openfold/openfold/model/model.py", line 512, in forward
outputs, m_1_prev, z_prev, x_prev = self.iteration(
File "/media/honeypot/baldwin/openfold/openfold/model/model.py", line 245, in iteration
m, z = self.input_embedder(
File "/media/honeypot/baldwin/openfold/lib/conda/envs/openfold_venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/media/honeypot/baldwin/openfold/openfold/model/embedders.py", line 116, in forward
tf_emb_i = self.linear_tf_z_i(tf)
File "/media/honeypot/baldwin/openfold/lib/conda/envs/openfold_venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/media/honeypot/baldwin/openfold/lib/conda/envs/openfold_venv/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
I am using cuda 11.7. I tried to reinstall openfold but it didn't work. Any idea ?
i also meet this error when using cuda11.6 with gcc11.6 and a100 cards, have you fixed this error?
i also meet this error when using cuda11.6 with gcc11.6 and a100 cards, have you fixed this error?
In my experience, there may be dimension mismatches when performing nn.Linear or nn.Embedding. You can run the script on the CPU to see what happened.
I've run into this issue, and there's a few steps you can take to troubleshoot.
First, make sure you always export library paths before running openfold. I would put this in your .bashrc
or .bash_profile
to make sure it runs every time.
export LIBRARY_PATH=$CONDA_PREFIX/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
Second, make sure you've run the install_third_party_dependencies
script as it says in the .readme. If you've done that and the issue persists, try running python setup.py install
. After running either, make sure to restart your session so that the changes are implemented, remembering to export the library paths after restarting.
You should now be able to pass the unit tests in bash scripts/run_unit_tests.sh
.
This has worked for me to resolved this issue. Hope this helps!