error running with a nvidia blackwell RTX5090
Hi all I installed AF without a problem in the following computer:
ASUS PRIME X870-P WIFI AMD Ryzen 9 7950X 16-Core Processor OS: AlmaLinux 8.10 (Cerulean Leopard) nvidia-smi: NVIDIA-SMI 575.57.08 Driver Version: 575.57.08 CUDA Version: 12.9 I have cuda 12.9 12.8 12.2 and 12.0 installed. Since this is a blackwell GPU only the nvidia-open driver flavor is supported (the one that I installed)
this is my python env for af runs: Python 3.11.11 absl-py==1.0.0 certifi==2025.4.26 charset-normalizer==2.1.1 docker==5.0.0 idna==3.10 requests==2.28.1 six==1.17.0 urllib3==1.26.20 websocket-client==1.8.0
when I run a alphafold job it starts. I can see jackhmmer running.
Here is the command:
JAX_TRACEBACK_FILTERING=off python3 /path_to_my_dir/ALPHAFOLD_GITHUB/alphafold/docker/run_docker.py --fasta_paths=/path_to_my_dir/ALPHAFOLD/TEST/Rpff2_toAlphF.fst --output_dir=/path_to_my_dir/ALPHAFOLD/TEST/A
FOUT/REDUCED-20250606_162608 --data_dir=/path_to_my_dir/ALPHAFOLD/DOWN_DBS --max_template_date=2024-12-31 --db_preset=reduced_dbs --model_preset=monomer_ptm
But when it starts generating the first model I get the following error:
I0606 15:08:19.539341 139629339072320 run_docker.py:258] Traceback (most recent call last): I0606 15:08:19.539410 139629339072320 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 570, in <module> I0606 15:08:19.539434 139629339072320 run_docker.py:258] app.run(main) I0606 15:08:19.539455 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/absl/app.py", line 312, in run I0606 15:08:19.539471 139629339072320 run_docker.py:258] _run_main(main, args) I0606 15:08:19.539486 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/absl/app.py", line 258, in _run_main I0606 15:08:19.539502 139629339072320 run_docker.py:258] sys.exit(main(argv)) I0606 15:08:19.539515 139629339072320 run_docker.py:258] ^^^^^^^^^^ I0606 15:08:19.539530 139629339072320 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 543, in main I0606 15:08:19.539554 139629339072320 run_docker.py:258] predict_structure( I0606 15:08:19.539569 139629339072320 run_docker.py:258] File "/app/alphafold/run_alphafold.py", line 284, in predict_structure I0606 15:08:19.539583 139629339072320 run_docker.py:258] prediction_result = model_runner.predict(processed_feature_dict, I0606 15:08:19.539597 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.539610 139629339072320 run_docker.py:258] File "/app/alphafold/alphafold/model/model.py", line 167, in predict I0606 15:08:19.539624 139629339072320 run_docker.py:258] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) I0606 15:08:19.539638 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.539651 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/random.py", line 241, in PRNGKey I0606 15:08:19.539664 139629339072320 run_docker.py:258] return _return_prng_keys(True, _key('PRNGKey', seed, impl)) I0606 15:08:19.539682 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.539695 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/random.py", line 203, in _key I0606 15:08:19.539713 139629339072320 run_docker.py:258] return prng.random_seed(seed, impl=impl) I0606 15:08:19.539726 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.539738 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/prng.py", line 639, in random_seed I0606 15:08:19.539750 139629339072320 run_docker.py:258] return random_seed_p.bind(seeds_arr, impl=impl) I0606 15:08:19.539766 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.539779 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/core.py", line 387, in bind I0606 15:08:19.539791 139629339072320 run_docker.py:258] return self.bind_with_trace(find_top_trace(args), args, params) I0606 15:08:19.539805 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.539818 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/core.py", line 391, in bind_with_trace I0606 15:08:19.539832 139629339072320 run_docker.py:258] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0606 15:08:19.539847 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.539859 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/core.py", line 879, in process_primitive I0606 15:08:19.539913 139629339072320 run_docker.py:258] return primitive.impl(*tracers, **params) I0606 15:08:19.539929 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.539941 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/prng.py", line 651, in random_seed_impl I0606 15:08:19.539990 139629339072320 run_docker.py:258] base_arr = random_seed_impl_base(seeds, impl=impl) I0606 15:08:19.540006 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.540019 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/prng.py", line 656, in random_seed_impl_base I0606 15:08:19.540048 139629339072320 run_docker.py:258] return seed(seeds) I0606 15:08:19.540071 139629339072320 run_docker.py:258] ^^^^^^^^^^^ I0606 15:08:19.540085 139629339072320 run_docker.py:258] File "/opt/conda/lib/python3.11/site-packages/jax/_src/prng.py", line 885, in threefry_seed I0606 15:08:19.540120 139629339072320 run_docker.py:258] return _threefry_seed(seed) I0606 15:08:19.540134 139629339072320 run_docker.py:258] ^^^^^^^^^^^^^^^^^^^^ I0606 15:08:19.540149 139629339072320 run_docker.py:258] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: ptxas exited with non-zero error code 65280, output: ptxas fatal : Program with .target 'sm_90a' cannot be compiled to future architecture I0606 15:08:19.540164 139629339072320 run_docker.py:258] : If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. I0606 15:08:19.540177 139629339072320 run_docker.py:258] -------------------- I0606 15:08:19.540189 139629339072320 run_docker.py:258] For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.
PS: I have installed and run alphafold without a problem in different computers, all with different hardware, but all of them with the same OS.
Any help will be much appreciated
Having identical issue here, happy to hear any suggestions for fix. Installing on a second computer with RTX5090 vs previous 4090, and while install succeeds, I'm getting the same traceback as above.
@ocstx Have you found a solution on the 5090 yet?
No, I had been focus on alphafold3, I found a fix for it in its issue 394, when I have the time I'll try to apply the same idea (update ubuntu image and python dependencies versions) to AF2, but I I lack the knowledge to know what makes sense on how to get the right combination (I never managed to get ESMfold working). If you could try I would very much appreciate it.
I've got it running for the GEFORCE RTX 5090 with driver version 570.153.02 and CUDA 12.8.0. In brief, I believe the most important changes included:
- using the nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04 image
- using jax[cuda12]==0.6.0
- upgrading to dm-haiku==0.0.14
- installing tensorflow==2.12.0 (alongside tensorflow-cpu)
Here are my Dockerfile and requirements.txt:
I haven't robustly tested it, but I'm getting pdbs out of a --model_preset=multimer run, so this at least gets you to a good starting point!
It totally worked for me! I run my standard test run which uses "--model_preset=monomer_ptm" and it delivered what was expected. The generating models part (the one really using GPU) was 20% faster than the same run with a RTX4090
thank you very much @strnadja
Using this Dockerfile will prompt a conflict in conda dependencies. Is it necessary to fully install this to build the operation?
This is strange. Last July it worked without errors. Could you include the full stder/stdout? You'll have to run Docker like this.
docker build --no-cache --progress=plain -f docker/Dockerfile -t alphafold . 2>&1 | tee build.log
I cannot see messages from @976282479, but I received them via email But the problem sims to be a conda error when running docker build.
In any case I chacked my build.log. This is there:
/bin/bash: /opt/conda/lib/libtinfo.so.6: no version information available (required by /bin/bash)
So I'm guessing it is not important. Regarding this one:
#8 1.940 CondaToSNonInteractiveError: Terms of Service have not been accepted for the following channels. Please accept or remove them before proceeding: #8 1.940 - https://repo.anaconda.com/pkgs/main #8 1.940 - https://repo.anaconda.com/pkgs/r #8 1.940 #8 1.940 To accept these channels' Terms of Service, run the following commands: #8 1.940 conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main #8 1.940 conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r #8 1.940 #8 1.940 For information on safely removing channels from your conda configuration, #8 1.940 please see the official documentation: #8 1.940 #8 1.940 https://www.anaconda.com/docs/tools/working-with-conda/channels #8 1.940 #8 ERROR: process "/bin/bash -o pipefail -c conda install --quiet --yes conda==24.11.1 pip python=3.11 && conda install --quiet --yes --channel conda-forge libstdcxx-ng>=12.1.0 openmm=8.0.0 pdbfixer && conda clean --all --force-pkgs-dirs --yes" did not complete successfully: exit code: 1
It is not in my build.log, but that seems a license thing, myabe something has changed. Did you follow the instructions in the error? I'm guessing that it could be acomplished by changing the line:
RUN conda install --quiet --yes conda==24.11.1 pip python=3.11 && conda install --quiet --yes --channel conda-forge libstdcxx-ng>=12.1.0 openmm=8.0.0 pdbfixer && conda clean --all --force-pkgs-dirs --yes
to:
RUN conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main && conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r && conda install --quiet --yes conda==24.11.1 pip python=3.11 && conda install --quiet --yes --channel conda-forge libstdcxx-ng>=12.1.0 openmm=8.0.0 pdbfixer && conda clean --all --force-pkgs-dirs --yes
But I'm not an expert in this. If someone finds the same problem and checks this, It would be appreciated