alphafold icon indicating copy to clipboard operation
alphafold copied to clipboard

Cuda error appeared when I was running AlphaFold version >=2.2.4

Open laolanllx opened this issue 1 year ago • 11 comments

I tried to run AlphaFold latest version on a new machine with GPU RTX4090, CUDA version 11.8 (downgraded from 12.2), Ubuntu 22.04 LTS. I used anaconda3 to build the alphafold environment. However, all of the alphafold version >=2.24 showed the same error.

I was wondering if anyone could help me to solve this? Please let me know if you need anything else. Thank you soooo much!

I've tried this method #646 , but it didn't work.

Nividia info: `(alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ nvidia-smi Wed May 17 16:49:49 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA Graphics... On | 00000000:01:00.0 Off | Off | | 30% 42C P8 24W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA Graphics... On | 00000000:08:00.0 Off | Off | | 30% 41C P8 19W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ `

Error message: (alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ python docker/run_docker.py --fasta_paths=/home/soft/Documents/8GZ6.fasta --max_template_date=3000-01-01 --data_dir=/soft/AF2/download/ --output_dir=/home/soft/Documents/8GZ6/ I0517 16:41:16.903001 140650292352064 run_docker.py:113] Mounting /home/soft/Documents -> /mnt/fasta_path_0 I0517 16:41:16.903073 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref90 -> /mnt/uniref90_database_path I0517 16:41:16.903110 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/mgnify -> /mnt/mgnify_database_path I0517 16:41:16.903137 140650292352064 run_docker.py:113] Mounting /soft/AF2/download -> /mnt/data_dir I0517 16:41:16.903162 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir I0517 16:41:16.903189 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif -> /mnt/obsolete_pdbs_path I0517 16:41:16.903218 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb70 -> /mnt/pdb70_database_path I0517 16:41:16.903246 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref30 -> /mnt/uniref30_database_path I0517 16:41:16.903274 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/bfd -> /mnt/bfd_database_path I0517 16:41:18.367146 140650292352064 run_docker.py:255] I0517 08:41:18.366440 139962212468544 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat. I0517 16:41:18.513591 140650292352064 run_docker.py:255] I0517 08:41:18.513257 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0517 16:41:18.642552 140650292352064 run_docker.py:255] I0517 08:41:18.642126 139962212468544 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA I0517 16:41:18.642671 140650292352064 run_docker.py:255] I0517 08:41:18.642353 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' I0517 16:41:18.642703 140650292352064 run_docker.py:255] I0517 08:41:18.642385 139962212468544 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. I0517 16:41:20.699224 140650292352064 run_docker.py:255] I0517 08:41:20.698850 139962212468544 run_alphafold.py:386] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0517 16:41:20.699338 140650292352064 run_docker.py:255] I0517 08:41:20.698937 139962212468544 run_alphafold.py:403] Using random seed 979532966947835319 for the data pipeline I0517 16:41:20.699380 140650292352064 run_docker.py:255] I0517 08:41:20.699029 139962212468544 run_alphafold.py:161] Predicting 8GZ6 I0517 16:41:20.699409 140650292352064 run_docker.py:255] I0517 08:41:20.699223 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpxzhf05lq/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/uniref90_database_path/uniref90.fasta" I0517 16:41:20.758846 140650292352064 run_docker.py:255] I0517 08:41:20.758264 139962212468544 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0517 16:43:32.400591 140650292352064 run_docker.py:255] I0517 08:43:32.399554 139962212468544 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 131.641 seconds I0517 16:43:32.550053 140650292352064 run_docker.py:255] I0517 08:43:32.549127 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp05y00c2a/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/mgnify_database_path/mgy_clusters_2022_05.fa" I0517 16:43:32.603918 140650292352064 run_docker.py:255] I0517 08:43:32.603443 139962212468544 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query I0517 16:46:37.409147 140650292352064 run_docker.py:255] I0517 08:46:37.407801 139962212468544 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 184.804 seconds I0517 16:46:37.951393 140650292352064 run_docker.py:255] I0517 08:46:37.950919 139962212468544 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpp63aisip/query.a3m -o /tmp/tmpp63aisip/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70" I0517 16:46:38.000887 140650292352064 run_docker.py:255] I0517 08:46:38.000401 139962212468544 utils.py:36] Started HHsearch query I0517 16:46:49.137153 140650292352064 run_docker.py:255] I0517 08:46:49.136655 139962212468544 utils.py:40] Finished HHsearch query in 11.136 seconds I0517 16:46:49.446268 140650292352064 run_docker.py:255] I0517 08:46:49.445886 139962212468544 hhblits.py:128] Launching subprocess "/usr/bin/hhblits -i /mnt/fasta_path_0/8GZ6.fasta -cpu 4 -oa3m /tmp/tmpzq5150hf/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /mnt/bfd_database_path/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /mnt/uniref30_database_path/UniRef30_2021_03" I0517 16:46:49.497454 140650292352064 run_docker.py:255] I0517 08:46:49.497032 139962212468544 utils.py:36] Started HHblits query I0517 16:48:52.683240 140650292352064 run_docker.py:255] I0517 08:48:52.682735 139962212468544 utils.py:40] Finished HHblits query in 123.186 seconds I0517 16:48:52.695582 140650292352064 run_docker.py:255] I0517 08:48:52.695324 139962212468544 templates.py:878] Searching for template for: QVQLQESGGGLVQAGGSLRLSCAASGRTSSVYNMAWFRQTPGKEREFVAAITGNGGTTLYADSVKGRLTISRGNAKNTVSLQMNVLKPDDTAVYYCAAGGWGKERNYAYWGQGTQVTVSSHHHHHH I0517 16:48:54.830226 140650292352064 run_docker.py:255] I0517 08:48:54.829735 139962212468544 templates.py:267] Found an exact template match 6qd6_C. I0517 16:48:54.838274 140650292352064 run_docker.py:255] I0517 08:48:54.837990 139962212468544 templates.py:267] Found an exact template match 6qd6_G. I0517 16:48:54.930829 140650292352064 run_docker.py:255] I0517 08:48:54.930498 139962212468544 templates.py:267] Found an exact template match 5wts_A. I0517 16:48:55.026873 140650292352064 run_docker.py:255] I0517 08:48:55.026528 139962212468544 templates.py:267] Found an exact template match 6gjs_B. I0517 16:48:55.637419 140650292352064 run_docker.py:255] I0517 08:48:55.636942 139962212468544 templates.py:267] Found an exact template match 6gkd_B. I0517 16:48:55.895157 140650292352064 run_docker.py:255] I0517 08:48:55.894784 139962212468544 templates.py:267] Found an exact template match 6hd8_A. I0517 16:48:56.275886 140650292352064 run_docker.py:255] I0517 08:48:56.275430 139962212468544 templates.py:267] Found an exact template match 6hd9_A. I0517 16:48:56.319466 140650292352064 run_docker.py:255] I0517 08:48:56.319131 139962212468544 templates.py:267] Found an exact template match 6rul_A. I0517 16:48:56.451577 140650292352064 run_docker.py:255] I0517 08:48:56.451225 139962212468544 templates.py:267] Found an exact template match 4pfe_A. I0517 16:48:56.701474 140650292352064 run_docker.py:255] I0517 08:48:56.701014 139962212468544 templates.py:267] Found an exact template match 3sn6_N. I0517 16:48:56.990272 140650292352064 run_docker.py:255] I0517 08:48:56.989793 139962212468544 templates.py:267] Found an exact template match 6pb1_N. I0517 16:48:57.033105 140650292352064 run_docker.py:255] I0517 08:48:57.032745 139962212468544 templates.py:267] Found an exact template match 6rum_A. I0517 16:48:57.086215 140650292352064 run_docker.py:255] I0517 08:48:57.085903 139962212468544 templates.py:267] Found an exact template match 5wb1_A. I0517 16:48:57.110778 140650292352064 run_docker.py:255] I0517 08:48:57.110478 139962212468544 templates.py:267] Found an exact template match 5vm6_A. I0517 16:48:57.186338 140650292352064 run_docker.py:255] I0517 08:48:57.186022 139962212468544 templates.py:267] Found an exact template match 5foj_A. I0517 16:48:57.237214 140650292352064 run_docker.py:255] I0517 08:48:57.236900 139962212468544 templates.py:267] Found an exact template match 5m2w_A. I0517 16:48:57.284224 140650292352064 run_docker.py:255] I0517 08:48:57.283923 139962212468544 templates.py:267] Found an exact template match 5mje_B. I0517 16:48:57.479903 140650292352064 run_docker.py:255] I0517 08:48:57.479458 139962212468544 templates.py:267] Found an exact template match 5vm4_L. I0517 16:48:57.845450 140650292352064 run_docker.py:255] I0517 08:48:57.844991 139962212468544 templates.py:267] Found an exact template match 4cdg_D. I0517 16:48:57.887320 140650292352064 run_docker.py:255] I0517 08:48:57.887031 139962212468544 templates.py:267] Found an exact template match 4gft_B. I0517 16:48:57.988736 140650292352064 run_docker.py:255] I0517 08:48:57.988251 139962212468544 pipeline.py:234] Uniref90 MSA size: 10000 sequences. I0517 16:48:57.988856 140650292352064 run_docker.py:255] I0517 08:48:57.988335 139962212468544 pipeline.py:235] BFD MSA size: 1612 sequences. I0517 16:48:57.988883 140650292352064 run_docker.py:255] I0517 08:48:57.988350 139962212468544 pipeline.py:236] MGnify MSA size: 501 sequences. I0517 16:48:57.988906 140650292352064 run_docker.py:255] I0517 08:48:57.988364 139962212468544 pipeline.py:237] Final (deduplicated) MSA size: 12020 sequences. I0517 16:48:57.988928 140650292352064 run_docker.py:255] I0517 08:48:57.988502 139962212468544 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20. I0517 16:48:58.114603 140650292352064 run_docker.py:255] I0517 08:48:58.113753 139962212468544 run_alphafold.py:191] Running model model_1_pred_0 on 8GZ6 I0517 16:48:59.508914 140650292352064 run_docker.py:255] I0517 08:48:59.508350 139962212468544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 126), 'residue_index': (4, 126), 'seq_length': (4,), 'template_aatype': (4, 4, 126), 'template_all_atom_masks': (4, 4, 126, 37), 'template_all_atom_positions': (4, 4, 126, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 126), 'msa_mask': (4, 508, 126), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 126, 3), 'template_pseudo_beta_mask': (4, 4, 126), 'atom14_atom_exists': (4, 126, 14), 'residx_atom14_to_atom37': (4, 126, 14), 'residx_atom37_to_atom14': (4, 126, 37), 'atom37_atom_exists': (4, 126, 37), 'extra_msa': (4, 5120, 126), 'extra_msa_mask': (4, 5120, 126), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 126), 'true_msa': (4, 508, 126), 'extra_has_deletion': (4, 5120, 126), 'extra_deletion_value': (4, 5120, 126), 'msa_feat': (4, 508, 126, 49), 'target_feat': (4, 126, 22)} I0517 16:48:59.600997 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600504: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9 I0517 16:48:59.601204 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600544: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas I0517 16:48:59.608552 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608081: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found I0517 16:48:59.608791 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608157: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function I0517 16:48:59.612349 140650292352064 run_docker.py:255] Traceback (most recent call last): I0517 16:48:59.612429 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 432, in <module> I0517 16:48:59.612536 140650292352064 run_docker.py:255] app.run(main) I0517 16:48:59.612603 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run I0517 16:48:59.612677 140650292352064 run_docker.py:255] _run_main(main, args) I0517 16:48:59.612745 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main I0517 16:48:59.612811 140650292352064 run_docker.py:255] sys.exit(main(argv)) I0517 16:48:59.612883 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 408, in main I0517 16:48:59.612949 140650292352064 run_docker.py:255] predict_structure( I0517 16:48:59.613012 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure I0517 16:48:59.613077 140650292352064 run_docker.py:255] prediction_result = model_runner.predict(processed_feature_dict, I0517 16:48:59.613144 140650292352064 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict I0517 16:48:59.613205 140650292352064 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) I0517 16:48:59.613268 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey I0517 16:48:59.613330 140650292352064 run_docker.py:255] key = prng.seed_with_impl(impl, seed) I0517 16:48:59.613391 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl I0517 16:48:59.613450 140650292352064 run_docker.py:255] return random_seed(seed, impl=impl) I0517 16:48:59.613508 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed I0517 16:48:59.613569 140650292352064 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl) I0517 16:48:59.613629 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.613687 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.613749 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.613809 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.613869 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.613931 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.613995 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl I0517 16:48:59.614058 140650292352064 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl) I0517 16:48:59.614119 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base I0517 16:48:59.614181 140650292352064 run_docker.py:255] return seed(seeds) I0517 16:48:59.614244 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed I0517 16:48:59.614307 140650292352064 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32))) I0517 16:48:59.614370 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical I0517 16:48:59.614432 140650292352064 run_docker.py:255] return shift_right_logical_p.bind(x, y) I0517 16:48:59.614495 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.614555 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.614619 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.614684 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.614765 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.614831 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.614899 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive I0517 16:48:59.614965 140650292352064 run_docker.py:255] return compiled_fun(*args) I0517 16:48:59.615031 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 200, in <lambda> I0517 16:48:59.615100 140650292352064 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0] I0517 16:48:59.615169 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled I0517 16:48:59.615240 140650292352064 run_docker.py:255] out_flat = compiled.execute(in_flat) I0517 16:48:59.615311 140650292352064 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

laolanllx avatar May 17 '23 09:05 laolanllx

: (512,), 'bert_mask': (512, 550), 'seq_mask': (550,), 'msa_mask': (512, 550)} I0526 16:59:03.898252 140519854102336 run_docker.py:235] Traceback (most recent call last): I0526 16:59:03.898312 140519854102336 run_docker.py:235] File "/app/alphafold/run_alphafold.py", line 459, in I0526 16:59:03.898356 140519854102336 run_docker.py:235] app.run(main) I0526 16:59:03.898376 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run I0526 16:59:03.898396 140519854102336 run_docker.py:235] _run_main(main, args) I0526 16:59:03.898415 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main I0526 16:59:03.898434 140519854102336 run_docker.py:235] sys.exit(main(argv)) I0526 16:59:03.898452 140519854102336 run_docker.py:235] File "/app/alphafold/run_alphafold.py", line 435, in main I0526 16:59:03.898472 140519854102336 run_docker.py:235] predict_structure( I0526 16:59:03.898491 140519854102336 run_docker.py:235] File "/app/alphafold/run_alphafold.py", line 221, in predict_structure I0526 16:59:03.898511 140519854102336 run_docker.py:235] prediction_result = model_runner.predict(processed_feature_dict, I0526 16:59:03.898529 140519854102336 run_docker.py:235] File "/app/alphafold/alphafold/model/model.py", line 167, in predict I0526 16:59:03.898547 140519854102336 run_docker.py:235] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) I0526 16:59:03.898566 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey I0526 16:59:03.898584 140519854102336 run_docker.py:235] key = prng.seed_with_impl(impl, seed) I0526 16:59:03.898602 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl I0526 16:59:03.898622 140519854102336 run_docker.py:235] return random_seed(seed, impl=impl) I0526 16:59:03.898639 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed I0526 16:59:03.898658 140519854102336 run_docker.py:235] return random_seed_p.bind(seeds_arr, impl=impl) I0526 16:59:03.898677 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0526 16:59:03.898695 140519854102336 run_docker.py:235] return self.bind_with_trace(find_top_trace(args), args, params) I0526 16:59:03.898713 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0526 16:59:03.898737 140519854102336 run_docker.py:235] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0526 16:59:03.898755 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0526 16:59:03.898774 140519854102336 run_docker.py:235] return primitive.impl(*tracers, **params) I0526 16:59:03.898792 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl I0526 16:59:03.898811 140519854102336 run_docker.py:235] base_arr = random_seed_impl_base(seeds, impl=impl) I0526 16:59:03.898829 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base I0526 16:59:03.898847 140519854102336 run_docker.py:235] return seed(seeds) I0526 16:59:03.898866 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed I0526 16:59:03.898885 140519854102336 run_docker.py:235] lax.shift_right_logical(seed, lax_internal._const(seed, 32))) I0526 16:59:03.898904 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical I0526 16:59:03.898922 140519854102336 run_docker.py:235] return shift_right_logical_p.bind(x, y) I0526 16:59:03.898941 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0526 16:59:03.898961 140519854102336 run_docker.py:235] return self.bind_with_trace(find_top_trace(args), args, params) I0526 16:59:03.898979 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0526 16:59:03.898997 140519854102336 run_docker.py:235] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0526 16:59:03.899016 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0526 16:59:03.899034 140519854102336 run_docker.py:235] return primitive.impl(*tracers, **params) I0526 16:59:03.899053 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 113, in apply_primitive I0526 16:59:03.899071 140519854102336 run_docker.py:235] compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args), I0526 16:59:03.899089 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/util.py", line 253, in wrapper I0526 16:59:03.899107 140519854102336 run_docker.py:235] return cached(config._trace_context(), *args, **kwargs) I0526 16:59:03.899126 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/util.py", line 246, in cached I0526 16:59:03.899144 140519854102336 run_docker.py:235] return f(*args, **kwargs) I0526 16:59:03.899163 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 197, in xla_primitive_callable I0526 16:59:03.899182 140519854102336 run_docker.py:235] compiled = _xla_callable_uncached(lu.wrap_init(prim_fun), device, None, I0526 16:59:03.899200 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 359, in _xla_callable_uncached I0526 16:59:03.899218 140519854102336 run_docker.py:235] return lower_xla_callable(fun, device, backend, name, donated_invars, False, I0526 16:59:03.899235 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 996, in compile I0526 16:59:03.899254 140519854102336 run_docker.py:235] self._executable = XlaCompiledComputation.from_xla_computation( I0526 16:59:03.899272 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 1194, in from_xla_computation I0526 16:59:03.899292 140519854102336 run_docker.py:235] compiled = compile_or_get_cached(backend, xla_computation, options, I0526 16:59:03.899310 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 1077, in compile_or_get_cached I0526 16:59:03.899328 140519854102336 run_docker.py:235] return backend_compile(backend, serialized_computation, compile_options, I0526 16:59:03.899347 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/profiler.py", line 314, in wrapper I0526 16:59:03.899366 140519854102336 run_docker.py:235] return func(*args, **kwargs) I0526 16:59:03.899384 140519854102336 run_docker.py:235] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 1012, in backend_compile I0526 16:59:03.899403 140519854102336 run_docker.py:235] return backend.compile(built_c, compile_options=options) I0526 16:59:03.899421 140519854102336 run_docker.py:235] jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version

Faezov avatar May 26 '23 20:05 Faezov

Same jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version error

Faezov avatar May 26 '23 21:05 Faezov

Same jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version error

I got the same error while trying to compile with CUDA 11.8 tools. But with my older CUDA 11.4 tools it works, despite my kernel CUDA version being much newer.

What I'm running:

  • V100
  • Driver 535.54.03
  • nvidia-smi shows CUDA Version 12.2

Docker image compiled with all 11.4 tools works, using 11.8 doesn't.

RJ3 avatar Aug 05 '23 19:08 RJ3

For 4090 machine, you need to change the followings in dockfile:

ARG CUDA=11.1.1------->ARG CUDA=11.8.0 FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu18.04------->FROM nvidia/cuda:${CUDA}-cudnn8-devel-ubuntu20.04

Then, rebuild.

HanLiii avatar Aug 15 '23 20:08 HanLiii

@HanLiii Thanks for your information. Even though I changed it as you said and built it for 4090 machine, the same error comes up. Did you succeed by changing it like that?

ORCAaAaA-ui avatar Aug 16 '23 17:08 ORCAaAaA-ui

@RJ3 can you share your working Dockerfile please?

rocketman8080 avatar Aug 21 '23 02:08 rocketman8080

I tried to run AlphaFold latest version on a new machine with GPU RTX4090, CUDA version 11.8 (downgraded from 12.2), Ubuntu 22.04 LTS. I used anaconda3 to build the alphafold environment. However, all of the alphafold version >=2.24 showed the same error.

I was wondering if anyone could help me to solve this? Please let me know if you need anything else. Thank you soooo much!

I've tried this method #646 , but it didn't work.

Nividia info: `(alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ nvidia-smi Wed May 17 16:49:49 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA Graphics... On | 00000000:01:00.0 Off | Off | | 30% 42C P8 24W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA Graphics... On | 00000000:08:00.0 Off | Off | | 30% 41C P8 19W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ `

Error message: (alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ python docker/run_docker.py --fasta_paths=/home/soft/Documents/8GZ6.fasta --max_template_date=3000-01-01 --data_dir=/soft/AF2/download/ --output_dir=/home/soft/Documents/8GZ6/ I0517 16:41:16.903001 140650292352064 run_docker.py:113] Mounting /home/soft/Documents -> /mnt/fasta_path_0 I0517 16:41:16.903073 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref90 -> /mnt/uniref90_database_path I0517 16:41:16.903110 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/mgnify -> /mnt/mgnify_database_path I0517 16:41:16.903137 140650292352064 run_docker.py:113] Mounting /soft/AF2/download -> /mnt/data_dir I0517 16:41:16.903162 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir I0517 16:41:16.903189 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif -> /mnt/obsolete_pdbs_path I0517 16:41:16.903218 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb70 -> /mnt/pdb70_database_path I0517 16:41:16.903246 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref30 -> /mnt/uniref30_database_path I0517 16:41:16.903274 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/bfd -> /mnt/bfd_database_path I0517 16:41:18.367146 140650292352064 run_docker.py:255] I0517 08:41:18.366440 139962212468544 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat. I0517 16:41:18.513591 140650292352064 run_docker.py:255] I0517 08:41:18.513257 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0517 16:41:18.642552 140650292352064 run_docker.py:255] I0517 08:41:18.642126 139962212468544 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA I0517 16:41:18.642671 140650292352064 run_docker.py:255] I0517 08:41:18.642353 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' I0517 16:41:18.642703 140650292352064 run_docker.py:255] I0517 08:41:18.642385 139962212468544 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. I0517 16:41:20.699224 140650292352064 run_docker.py:255] I0517 08:41:20.698850 139962212468544 run_alphafold.py:386] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0517 16:41:20.699338 140650292352064 run_docker.py:255] I0517 08:41:20.698937 139962212468544 run_alphafold.py:403] Using random seed 979532966947835319 for the data pipeline I0517 16:41:20.699380 140650292352064 run_docker.py:255] I0517 08:41:20.699029 139962212468544 run_alphafold.py:161] Predicting 8GZ6 I0517 16:41:20.699409 140650292352064 run_docker.py:255] I0517 08:41:20.699223 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpxzhf05lq/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/uniref90_database_path/uniref90.fasta" I0517 16:41:20.758846 140650292352064 run_docker.py:255] I0517 08:41:20.758264 139962212468544 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0517 16:43:32.400591 140650292352064 run_docker.py:255] I0517 08:43:32.399554 139962212468544 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 131.641 seconds I0517 16:43:32.550053 140650292352064 run_docker.py:255] I0517 08:43:32.549127 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp05y00c2a/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/mgnify_database_path/mgy_clusters_2022_05.fa" I0517 16:43:32.603918 140650292352064 run_docker.py:255] I0517 08:43:32.603443 139962212468544 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query I0517 16:46:37.409147 140650292352064 run_docker.py:255] I0517 08:46:37.407801 139962212468544 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 184.804 seconds I0517 16:46:37.951393 140650292352064 run_docker.py:255] I0517 08:46:37.950919 139962212468544 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpp63aisip/query.a3m -o /tmp/tmpp63aisip/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70" I0517 16:46:38.000887 140650292352064 run_docker.py:255] I0517 08:46:38.000401 139962212468544 utils.py:36] Started HHsearch query I0517 16:46:49.137153 140650292352064 run_docker.py:255] I0517 08:46:49.136655 139962212468544 utils.py:40] Finished HHsearch query in 11.136 seconds I0517 16:46:49.446268 140650292352064 run_docker.py:255] I0517 08:46:49.445886 139962212468544 hhblits.py:128] Launching subprocess "/usr/bin/hhblits -i /mnt/fasta_path_0/8GZ6.fasta -cpu 4 -oa3m /tmp/tmpzq5150hf/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /mnt/bfd_database_path/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /mnt/uniref30_database_path/UniRef30_2021_03" I0517 16:46:49.497454 140650292352064 run_docker.py:255] I0517 08:46:49.497032 139962212468544 utils.py:36] Started HHblits query I0517 16:48:52.683240 140650292352064 run_docker.py:255] I0517 08:48:52.682735 139962212468544 utils.py:40] Finished HHblits query in 123.186 seconds I0517 16:48:52.695582 140650292352064 run_docker.py:255] I0517 08:48:52.695324 139962212468544 templates.py:878] Searching for template for: QVQLQESGGGLVQAGGSLRLSCAASGRTSSVYNMAWFRQTPGKEREFVAAITGNGGTTLYADSVKGRLTISRGNAKNTVSLQMNVLKPDDTAVYYCAAGGWGKERNYAYWGQGTQVTVSSHHHHHH I0517 16:48:54.830226 140650292352064 run_docker.py:255] I0517 08:48:54.829735 139962212468544 templates.py:267] Found an exact template match 6qd6_C. I0517 16:48:54.838274 140650292352064 run_docker.py:255] I0517 08:48:54.837990 139962212468544 templates.py:267] Found an exact template match 6qd6_G. I0517 16:48:54.930829 140650292352064 run_docker.py:255] I0517 08:48:54.930498 139962212468544 templates.py:267] Found an exact template match 5wts_A. I0517 16:48:55.026873 140650292352064 run_docker.py:255] I0517 08:48:55.026528 139962212468544 templates.py:267] Found an exact template match 6gjs_B. I0517 16:48:55.637419 140650292352064 run_docker.py:255] I0517 08:48:55.636942 139962212468544 templates.py:267] Found an exact template match 6gkd_B. I0517 16:48:55.895157 140650292352064 run_docker.py:255] I0517 08:48:55.894784 139962212468544 templates.py:267] Found an exact template match 6hd8_A. I0517 16:48:56.275886 140650292352064 run_docker.py:255] I0517 08:48:56.275430 139962212468544 templates.py:267] Found an exact template match 6hd9_A. I0517 16:48:56.319466 140650292352064 run_docker.py:255] I0517 08:48:56.319131 139962212468544 templates.py:267] Found an exact template match 6rul_A. I0517 16:48:56.451577 140650292352064 run_docker.py:255] I0517 08:48:56.451225 139962212468544 templates.py:267] Found an exact template match 4pfe_A. I0517 16:48:56.701474 140650292352064 run_docker.py:255] I0517 08:48:56.701014 139962212468544 templates.py:267] Found an exact template match 3sn6_N. I0517 16:48:56.990272 140650292352064 run_docker.py:255] I0517 08:48:56.989793 139962212468544 templates.py:267] Found an exact template match 6pb1_N. I0517 16:48:57.033105 140650292352064 run_docker.py:255] I0517 08:48:57.032745 139962212468544 templates.py:267] Found an exact template match 6rum_A. I0517 16:48:57.086215 140650292352064 run_docker.py:255] I0517 08:48:57.085903 139962212468544 templates.py:267] Found an exact template match 5wb1_A. I0517 16:48:57.110778 140650292352064 run_docker.py:255] I0517 08:48:57.110478 139962212468544 templates.py:267] Found an exact template match 5vm6_A. I0517 16:48:57.186338 140650292352064 run_docker.py:255] I0517 08:48:57.186022 139962212468544 templates.py:267] Found an exact template match 5foj_A. I0517 16:48:57.237214 140650292352064 run_docker.py:255] I0517 08:48:57.236900 139962212468544 templates.py:267] Found an exact template match 5m2w_A. I0517 16:48:57.284224 140650292352064 run_docker.py:255] I0517 08:48:57.283923 139962212468544 templates.py:267] Found an exact template match 5mje_B. I0517 16:48:57.479903 140650292352064 run_docker.py:255] I0517 08:48:57.479458 139962212468544 templates.py:267] Found an exact template match 5vm4_L. I0517 16:48:57.845450 140650292352064 run_docker.py:255] I0517 08:48:57.844991 139962212468544 templates.py:267] Found an exact template match 4cdg_D. I0517 16:48:57.887320 140650292352064 run_docker.py:255] I0517 08:48:57.887031 139962212468544 templates.py:267] Found an exact template match 4gft_B. I0517 16:48:57.988736 140650292352064 run_docker.py:255] I0517 08:48:57.988251 139962212468544 pipeline.py:234] Uniref90 MSA size: 10000 sequences. I0517 16:48:57.988856 140650292352064 run_docker.py:255] I0517 08:48:57.988335 139962212468544 pipeline.py:235] BFD MSA size: 1612 sequences. I0517 16:48:57.988883 140650292352064 run_docker.py:255] I0517 08:48:57.988350 139962212468544 pipeline.py:236] MGnify MSA size: 501 sequences. I0517 16:48:57.988906 140650292352064 run_docker.py:255] I0517 08:48:57.988364 139962212468544 pipeline.py:237] Final (deduplicated) MSA size: 12020 sequences. I0517 16:48:57.988928 140650292352064 run_docker.py:255] I0517 08:48:57.988502 139962212468544 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20. I0517 16:48:58.114603 140650292352064 run_docker.py:255] I0517 08:48:58.113753 139962212468544 run_alphafold.py:191] Running model model_1_pred_0 on 8GZ6 I0517 16:48:59.508914 140650292352064 run_docker.py:255] I0517 08:48:59.508350 139962212468544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 126), 'residue_index': (4, 126), 'seq_length': (4,), 'template_aatype': (4, 4, 126), 'template_all_atom_masks': (4, 4, 126, 37), 'template_all_atom_positions': (4, 4, 126, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 126), 'msa_mask': (4, 508, 126), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 126, 3), 'template_pseudo_beta_mask': (4, 4, 126), 'atom14_atom_exists': (4, 126, 14), 'residx_atom14_to_atom37': (4, 126, 14), 'residx_atom37_to_atom14': (4, 126, 37), 'atom37_atom_exists': (4, 126, 37), 'extra_msa': (4, 5120, 126), 'extra_msa_mask': (4, 5120, 126), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 126), 'true_msa': (4, 508, 126), 'extra_has_deletion': (4, 5120, 126), 'extra_deletion_value': (4, 5120, 126), 'msa_feat': (4, 508, 126, 49), 'target_feat': (4, 126, 22)} I0517 16:48:59.600997 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600504: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9 I0517 16:48:59.601204 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600544: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas I0517 16:48:59.608552 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608081: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found I0517 16:48:59.608791 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608157: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function I0517 16:48:59.612349 140650292352064 run_docker.py:255] Traceback (most recent call last): I0517 16:48:59.612429 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 432, in <module> I0517 16:48:59.612536 140650292352064 run_docker.py:255] app.run(main) I0517 16:48:59.612603 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run I0517 16:48:59.612677 140650292352064 run_docker.py:255] _run_main(main, args) I0517 16:48:59.612745 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main I0517 16:48:59.612811 140650292352064 run_docker.py:255] sys.exit(main(argv)) I0517 16:48:59.612883 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 408, in main I0517 16:48:59.612949 140650292352064 run_docker.py:255] predict_structure( I0517 16:48:59.613012 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure I0517 16:48:59.613077 140650292352064 run_docker.py:255] prediction_result = model_runner.predict(processed_feature_dict, I0517 16:48:59.613144 140650292352064 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict I0517 16:48:59.613205 140650292352064 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) I0517 16:48:59.613268 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey I0517 16:48:59.613330 140650292352064 run_docker.py:255] key = prng.seed_with_impl(impl, seed) I0517 16:48:59.613391 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl I0517 16:48:59.613450 140650292352064 run_docker.py:255] return random_seed(seed, impl=impl) I0517 16:48:59.613508 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed I0517 16:48:59.613569 140650292352064 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl) I0517 16:48:59.613629 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.613687 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.613749 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.613809 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.613869 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.613931 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.613995 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl I0517 16:48:59.614058 140650292352064 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl) I0517 16:48:59.614119 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base I0517 16:48:59.614181 140650292352064 run_docker.py:255] return seed(seeds) I0517 16:48:59.614244 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed I0517 16:48:59.614307 140650292352064 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32))) I0517 16:48:59.614370 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical I0517 16:48:59.614432 140650292352064 run_docker.py:255] return shift_right_logical_p.bind(x, y) I0517 16:48:59.614495 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.614555 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.614619 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.614684 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.614765 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.614831 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.614899 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive I0517 16:48:59.614965 140650292352064 run_docker.py:255] return compiled_fun(*args) I0517 16:48:59.615031 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 200, in <lambda> I0517 16:48:59.615100 140650292352064 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0] I0517 16:48:59.615169 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled I0517 16:48:59.615240 140650292352064 run_docker.py:255] out_flat = compiled.execute(in_flat) I0517 16:48:59.615311 140650292352064 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

I got exactly same ERROR. Have you solved it, bro? My nividia info is as follow: Fri Nov 17 14:21:52 2023
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | Off | | 0% 41C P8 28W / 450W | 6MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3164 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------+

ChengkuiZhao avatar Nov 17 '23 06:11 ChengkuiZhao

I tried to run AlphaFold latest version on a new machine with GPU RTX4090, CUDA version 11.8 (downgraded from 12.2), Ubuntu 22.04 LTS. I used anaconda3 to build the alphafold environment. However, all of the alphafold version >=2.24 showed the same error. I was wondering if anyone could help me to solve this? Please let me know if you need anything else. Thank you soooo much! I've tried this method #646 , but it didn't work. Nividia info: (alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ nvidia-smi Wed May 17 16:49:49 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA Graphics... On | 00000000:01:00.0 Off | Off | | 30% 42C P8 24W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA Graphics... On | 00000000:08:00.0 Off | Off | | 30% 41C P8 19W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ Error message: (alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ python docker/run_docker.py --fasta_paths=/home/soft/Documents/8GZ6.fasta --max_template_date=3000-01-01 --data_dir=/soft/AF2/download/ --output_dir=/home/soft/Documents/8GZ6/ I0517 16:41:16.903001 140650292352064 run_docker.py:113] Mounting /home/soft/Documents -> /mnt/fasta_path_0 I0517 16:41:16.903073 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref90 -> /mnt/uniref90_database_path I0517 16:41:16.903110 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/mgnify -> /mnt/mgnify_database_path I0517 16:41:16.903137 140650292352064 run_docker.py:113] Mounting /soft/AF2/download -> /mnt/data_dir I0517 16:41:16.903162 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir I0517 16:41:16.903189 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif -> /mnt/obsolete_pdbs_path I0517 16:41:16.903218 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb70 -> /mnt/pdb70_database_path I0517 16:41:16.903246 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref30 -> /mnt/uniref30_database_path I0517 16:41:16.903274 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/bfd -> /mnt/bfd_database_path I0517 16:41:18.367146 140650292352064 run_docker.py:255] I0517 08:41:18.366440 139962212468544 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat. I0517 16:41:18.513591 140650292352064 run_docker.py:255] I0517 08:41:18.513257 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0517 16:41:18.642552 140650292352064 run_docker.py:255] I0517 08:41:18.642126 139962212468544 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA I0517 16:41:18.642671 140650292352064 run_docker.py:255] I0517 08:41:18.642353 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' I0517 16:41:18.642703 140650292352064 run_docker.py:255] I0517 08:41:18.642385 139962212468544 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. I0517 16:41:20.699224 140650292352064 run_docker.py:255] I0517 08:41:20.698850 139962212468544 run_alphafold.py:386] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0517 16:41:20.699338 140650292352064 run_docker.py:255] I0517 08:41:20.698937 139962212468544 run_alphafold.py:403] Using random seed 979532966947835319 for the data pipeline I0517 16:41:20.699380 140650292352064 run_docker.py:255] I0517 08:41:20.699029 139962212468544 run_alphafold.py:161] Predicting 8GZ6 I0517 16:41:20.699409 140650292352064 run_docker.py:255] I0517 08:41:20.699223 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpxzhf05lq/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/uniref90_database_path/uniref90.fasta" I0517 16:41:20.758846 140650292352064 run_docker.py:255] I0517 08:41:20.758264 139962212468544 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0517 16:43:32.400591 140650292352064 run_docker.py:255] I0517 08:43:32.399554 139962212468544 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 131.641 seconds I0517 16:43:32.550053 140650292352064 run_docker.py:255] I0517 08:43:32.549127 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp05y00c2a/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/mgnify_database_path/mgy_clusters_2022_05.fa" I0517 16:43:32.603918 140650292352064 run_docker.py:255] I0517 08:43:32.603443 139962212468544 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query I0517 16:46:37.409147 140650292352064 run_docker.py:255] I0517 08:46:37.407801 139962212468544 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 184.804 seconds I0517 16:46:37.951393 140650292352064 run_docker.py:255] I0517 08:46:37.950919 139962212468544 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpp63aisip/query.a3m -o /tmp/tmpp63aisip/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70" I0517 16:46:38.000887 140650292352064 run_docker.py:255] I0517 08:46:38.000401 139962212468544 utils.py:36] Started HHsearch query I0517 16:46:49.137153 140650292352064 run_docker.py:255] I0517 08:46:49.136655 139962212468544 utils.py:40] Finished HHsearch query in 11.136 seconds I0517 16:46:49.446268 140650292352064 run_docker.py:255] I0517 08:46:49.445886 139962212468544 hhblits.py:128] Launching subprocess "/usr/bin/hhblits -i /mnt/fasta_path_0/8GZ6.fasta -cpu 4 -oa3m /tmp/tmpzq5150hf/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /mnt/bfd_database_path/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /mnt/uniref30_database_path/UniRef30_2021_03" I0517 16:46:49.497454 140650292352064 run_docker.py:255] I0517 08:46:49.497032 139962212468544 utils.py:36] Started HHblits query I0517 16:48:52.683240 140650292352064 run_docker.py:255] I0517 08:48:52.682735 139962212468544 utils.py:40] Finished HHblits query in 123.186 seconds I0517 16:48:52.695582 140650292352064 run_docker.py:255] I0517 08:48:52.695324 139962212468544 templates.py:878] Searching for template for: QVQLQESGGGLVQAGGSLRLSCAASGRTSSVYNMAWFRQTPGKEREFVAAITGNGGTTLYADSVKGRLTISRGNAKNTVSLQMNVLKPDDTAVYYCAAGGWGKERNYAYWGQGTQVTVSSHHHHHH I0517 16:48:54.830226 140650292352064 run_docker.py:255] I0517 08:48:54.829735 139962212468544 templates.py:267] Found an exact template match 6qd6_C. I0517 16:48:54.838274 140650292352064 run_docker.py:255] I0517 08:48:54.837990 139962212468544 templates.py:267] Found an exact template match 6qd6_G. I0517 16:48:54.930829 140650292352064 run_docker.py:255] I0517 08:48:54.930498 139962212468544 templates.py:267] Found an exact template match 5wts_A. I0517 16:48:55.026873 140650292352064 run_docker.py:255] I0517 08:48:55.026528 139962212468544 templates.py:267] Found an exact template match 6gjs_B. I0517 16:48:55.637419 140650292352064 run_docker.py:255] I0517 08:48:55.636942 139962212468544 templates.py:267] Found an exact template match 6gkd_B. I0517 16:48:55.895157 140650292352064 run_docker.py:255] I0517 08:48:55.894784 139962212468544 templates.py:267] Found an exact template match 6hd8_A. I0517 16:48:56.275886 140650292352064 run_docker.py:255] I0517 08:48:56.275430 139962212468544 templates.py:267] Found an exact template match 6hd9_A. I0517 16:48:56.319466 140650292352064 run_docker.py:255] I0517 08:48:56.319131 139962212468544 templates.py:267] Found an exact template match 6rul_A. I0517 16:48:56.451577 140650292352064 run_docker.py:255] I0517 08:48:56.451225 139962212468544 templates.py:267] Found an exact template match 4pfe_A. I0517 16:48:56.701474 140650292352064 run_docker.py:255] I0517 08:48:56.701014 139962212468544 templates.py:267] Found an exact template match 3sn6_N. I0517 16:48:56.990272 140650292352064 run_docker.py:255] I0517 08:48:56.989793 139962212468544 templates.py:267] Found an exact template match 6pb1_N. I0517 16:48:57.033105 140650292352064 run_docker.py:255] I0517 08:48:57.032745 139962212468544 templates.py:267] Found an exact template match 6rum_A. I0517 16:48:57.086215 140650292352064 run_docker.py:255] I0517 08:48:57.085903 139962212468544 templates.py:267] Found an exact template match 5wb1_A. I0517 16:48:57.110778 140650292352064 run_docker.py:255] I0517 08:48:57.110478 139962212468544 templates.py:267] Found an exact template match 5vm6_A. I0517 16:48:57.186338 140650292352064 run_docker.py:255] I0517 08:48:57.186022 139962212468544 templates.py:267] Found an exact template match 5foj_A. I0517 16:48:57.237214 140650292352064 run_docker.py:255] I0517 08:48:57.236900 139962212468544 templates.py:267] Found an exact template match 5m2w_A. I0517 16:48:57.284224 140650292352064 run_docker.py:255] I0517 08:48:57.283923 139962212468544 templates.py:267] Found an exact template match 5mje_B. I0517 16:48:57.479903 140650292352064 run_docker.py:255] I0517 08:48:57.479458 139962212468544 templates.py:267] Found an exact template match 5vm4_L. I0517 16:48:57.845450 140650292352064 run_docker.py:255] I0517 08:48:57.844991 139962212468544 templates.py:267] Found an exact template match 4cdg_D. I0517 16:48:57.887320 140650292352064 run_docker.py:255] I0517 08:48:57.887031 139962212468544 templates.py:267] Found an exact template match 4gft_B. I0517 16:48:57.988736 140650292352064 run_docker.py:255] I0517 08:48:57.988251 139962212468544 pipeline.py:234] Uniref90 MSA size: 10000 sequences. I0517 16:48:57.988856 140650292352064 run_docker.py:255] I0517 08:48:57.988335 139962212468544 pipeline.py:235] BFD MSA size: 1612 sequences. I0517 16:48:57.988883 140650292352064 run_docker.py:255] I0517 08:48:57.988350 139962212468544 pipeline.py:236] MGnify MSA size: 501 sequences. I0517 16:48:57.988906 140650292352064 run_docker.py:255] I0517 08:48:57.988364 139962212468544 pipeline.py:237] Final (deduplicated) MSA size: 12020 sequences. I0517 16:48:57.988928 140650292352064 run_docker.py:255] I0517 08:48:57.988502 139962212468544 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20. I0517 16:48:58.114603 140650292352064 run_docker.py:255] I0517 08:48:58.113753 139962212468544 run_alphafold.py:191] Running model model_1_pred_0 on 8GZ6 I0517 16:48:59.508914 140650292352064 run_docker.py:255] I0517 08:48:59.508350 139962212468544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 126), 'residue_index': (4, 126), 'seq_length': (4,), 'template_aatype': (4, 4, 126), 'template_all_atom_masks': (4, 4, 126, 37), 'template_all_atom_positions': (4, 4, 126, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 126), 'msa_mask': (4, 508, 126), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 126, 3), 'template_pseudo_beta_mask': (4, 4, 126), 'atom14_atom_exists': (4, 126, 14), 'residx_atom14_to_atom37': (4, 126, 14), 'residx_atom37_to_atom14': (4, 126, 37), 'atom37_atom_exists': (4, 126, 37), 'extra_msa': (4, 5120, 126), 'extra_msa_mask': (4, 5120, 126), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 126), 'true_msa': (4, 508, 126), 'extra_has_deletion': (4, 5120, 126), 'extra_deletion_value': (4, 5120, 126), 'msa_feat': (4, 508, 126, 49), 'target_feat': (4, 126, 22)} I0517 16:48:59.600997 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600504: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9 I0517 16:48:59.601204 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600544: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas I0517 16:48:59.608552 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608081: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found I0517 16:48:59.608791 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608157: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function I0517 16:48:59.612349 140650292352064 run_docker.py:255] Traceback (most recent call last): I0517 16:48:59.612429 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 432, in <module> I0517 16:48:59.612536 140650292352064 run_docker.py:255] app.run(main) I0517 16:48:59.612603 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run I0517 16:48:59.612677 140650292352064 run_docker.py:255] _run_main(main, args) I0517 16:48:59.612745 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main I0517 16:48:59.612811 140650292352064 run_docker.py:255] sys.exit(main(argv)) I0517 16:48:59.612883 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 408, in main I0517 16:48:59.612949 140650292352064 run_docker.py:255] predict_structure( I0517 16:48:59.613012 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure I0517 16:48:59.613077 140650292352064 run_docker.py:255] prediction_result = model_runner.predict(processed_feature_dict, I0517 16:48:59.613144 140650292352064 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict I0517 16:48:59.613205 140650292352064 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) I0517 16:48:59.613268 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey I0517 16:48:59.613330 140650292352064 run_docker.py:255] key = prng.seed_with_impl(impl, seed) I0517 16:48:59.613391 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl I0517 16:48:59.613450 140650292352064 run_docker.py:255] return random_seed(seed, impl=impl) I0517 16:48:59.613508 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed I0517 16:48:59.613569 140650292352064 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl) I0517 16:48:59.613629 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.613687 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.613749 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.613809 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.613869 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.613931 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.613995 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl I0517 16:48:59.614058 140650292352064 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl) I0517 16:48:59.614119 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base I0517 16:48:59.614181 140650292352064 run_docker.py:255] return seed(seeds) I0517 16:48:59.614244 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed I0517 16:48:59.614307 140650292352064 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32))) I0517 16:48:59.614370 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical I0517 16:48:59.614432 140650292352064 run_docker.py:255] return shift_right_logical_p.bind(x, y) I0517 16:48:59.614495 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.614555 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.614619 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.614684 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.614765 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.614831 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.614899 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive I0517 16:48:59.614965 140650292352064 run_docker.py:255] return compiled_fun(*args) I0517 16:48:59.615031 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 200, in <lambda> I0517 16:48:59.615100 140650292352064 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0] I0517 16:48:59.615169 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled I0517 16:48:59.615240 140650292352064 run_docker.py:255] out_flat = compiled.execute(in_flat) I0517 16:48:59.615311 140650292352064 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

I got exactly same ERROR. Have you solved it, bro? My nividia info is as follow: Fri Nov 17 14:21:52 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | Off | | 0% 41C P8 28W / 450W | 6MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3164 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------+

Unfortunately, I haven't solved it. Recently, I got the same error as #853. I tried to change the Dockerfile, followed https://github.com/google-deepmind/alphafold/issues/764#issuecomment-1679537433, but it didn't work and caused a crash.

laolanllx avatar Nov 17 '23 06:11 laolanllx

I tried to run AlphaFold latest version on a new machine with GPU RTX4090, CUDA version 11.8 (downgraded from 12.2), Ubuntu 22.04 LTS. I used anaconda3 to build the alphafold environment. However, all of the alphafold version >=2.24 showed the same error. I was wondering if anyone could help me to solve this? Please let me know if you need anything else. Thank you soooo much! I've tried this method #646 , but it didn't work. Nividia info: (alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ nvidia-smi Wed May 17 16:49:49 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA Graphics... On | 00000000:01:00.0 Off | Off | | 30% 42C P8 24W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA Graphics... On | 00000000:08:00.0 Off | Off | | 30% 41C P8 19W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ Error message: (alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ python docker/run_docker.py --fasta_paths=/home/soft/Documents/8GZ6.fasta --max_template_date=3000-01-01 --data_dir=/soft/AF2/download/ --output_dir=/home/soft/Documents/8GZ6/ I0517 16:41:16.903001 140650292352064 run_docker.py:113] Mounting /home/soft/Documents -> /mnt/fasta_path_0 I0517 16:41:16.903073 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref90 -> /mnt/uniref90_database_path I0517 16:41:16.903110 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/mgnify -> /mnt/mgnify_database_path I0517 16:41:16.903137 140650292352064 run_docker.py:113] Mounting /soft/AF2/download -> /mnt/data_dir I0517 16:41:16.903162 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir I0517 16:41:16.903189 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif -> /mnt/obsolete_pdbs_path I0517 16:41:16.903218 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb70 -> /mnt/pdb70_database_path I0517 16:41:16.903246 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref30 -> /mnt/uniref30_database_path I0517 16:41:16.903274 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/bfd -> /mnt/bfd_database_path I0517 16:41:18.367146 140650292352064 run_docker.py:255] I0517 08:41:18.366440 139962212468544 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat. I0517 16:41:18.513591 140650292352064 run_docker.py:255] I0517 08:41:18.513257 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0517 16:41:18.642552 140650292352064 run_docker.py:255] I0517 08:41:18.642126 139962212468544 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA I0517 16:41:18.642671 140650292352064 run_docker.py:255] I0517 08:41:18.642353 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' I0517 16:41:18.642703 140650292352064 run_docker.py:255] I0517 08:41:18.642385 139962212468544 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. I0517 16:41:20.699224 140650292352064 run_docker.py:255] I0517 08:41:20.698850 139962212468544 run_alphafold.py:386] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0517 16:41:20.699338 140650292352064 run_docker.py:255] I0517 08:41:20.698937 139962212468544 run_alphafold.py:403] Using random seed 979532966947835319 for the data pipeline I0517 16:41:20.699380 140650292352064 run_docker.py:255] I0517 08:41:20.699029 139962212468544 run_alphafold.py:161] Predicting 8GZ6 I0517 16:41:20.699409 140650292352064 run_docker.py:255] I0517 08:41:20.699223 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpxzhf05lq/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/uniref90_database_path/uniref90.fasta" I0517 16:41:20.758846 140650292352064 run_docker.py:255] I0517 08:41:20.758264 139962212468544 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0517 16:43:32.400591 140650292352064 run_docker.py:255] I0517 08:43:32.399554 139962212468544 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 131.641 seconds I0517 16:43:32.550053 140650292352064 run_docker.py:255] I0517 08:43:32.549127 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp05y00c2a/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/mgnify_database_path/mgy_clusters_2022_05.fa" I0517 16:43:32.603918 140650292352064 run_docker.py:255] I0517 08:43:32.603443 139962212468544 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query I0517 16:46:37.409147 140650292352064 run_docker.py:255] I0517 08:46:37.407801 139962212468544 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 184.804 seconds I0517 16:46:37.951393 140650292352064 run_docker.py:255] I0517 08:46:37.950919 139962212468544 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpp63aisip/query.a3m -o /tmp/tmpp63aisip/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70" I0517 16:46:38.000887 140650292352064 run_docker.py:255] I0517 08:46:38.000401 139962212468544 utils.py:36] Started HHsearch query I0517 16:46:49.137153 140650292352064 run_docker.py:255] I0517 08:46:49.136655 139962212468544 utils.py:40] Finished HHsearch query in 11.136 seconds I0517 16:46:49.446268 140650292352064 run_docker.py:255] I0517 08:46:49.445886 139962212468544 hhblits.py:128] Launching subprocess "/usr/bin/hhblits -i /mnt/fasta_path_0/8GZ6.fasta -cpu 4 -oa3m /tmp/tmpzq5150hf/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /mnt/bfd_database_path/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /mnt/uniref30_database_path/UniRef30_2021_03" I0517 16:46:49.497454 140650292352064 run_docker.py:255] I0517 08:46:49.497032 139962212468544 utils.py:36] Started HHblits query I0517 16:48:52.683240 140650292352064 run_docker.py:255] I0517 08:48:52.682735 139962212468544 utils.py:40] Finished HHblits query in 123.186 seconds I0517 16:48:52.695582 140650292352064 run_docker.py:255] I0517 08:48:52.695324 139962212468544 templates.py:878] Searching for template for: QVQLQESGGGLVQAGGSLRLSCAASGRTSSVYNMAWFRQTPGKEREFVAAITGNGGTTLYADSVKGRLTISRGNAKNTVSLQMNVLKPDDTAVYYCAAGGWGKERNYAYWGQGTQVTVSSHHHHHH I0517 16:48:54.830226 140650292352064 run_docker.py:255] I0517 08:48:54.829735 139962212468544 templates.py:267] Found an exact template match 6qd6_C. I0517 16:48:54.838274 140650292352064 run_docker.py:255] I0517 08:48:54.837990 139962212468544 templates.py:267] Found an exact template match 6qd6_G. I0517 16:48:54.930829 140650292352064 run_docker.py:255] I0517 08:48:54.930498 139962212468544 templates.py:267] Found an exact template match 5wts_A. I0517 16:48:55.026873 140650292352064 run_docker.py:255] I0517 08:48:55.026528 139962212468544 templates.py:267] Found an exact template match 6gjs_B. I0517 16:48:55.637419 140650292352064 run_docker.py:255] I0517 08:48:55.636942 139962212468544 templates.py:267] Found an exact template match 6gkd_B. I0517 16:48:55.895157 140650292352064 run_docker.py:255] I0517 08:48:55.894784 139962212468544 templates.py:267] Found an exact template match 6hd8_A. I0517 16:48:56.275886 140650292352064 run_docker.py:255] I0517 08:48:56.275430 139962212468544 templates.py:267] Found an exact template match 6hd9_A. I0517 16:48:56.319466 140650292352064 run_docker.py:255] I0517 08:48:56.319131 139962212468544 templates.py:267] Found an exact template match 6rul_A. I0517 16:48:56.451577 140650292352064 run_docker.py:255] I0517 08:48:56.451225 139962212468544 templates.py:267] Found an exact template match 4pfe_A. I0517 16:48:56.701474 140650292352064 run_docker.py:255] I0517 08:48:56.701014 139962212468544 templates.py:267] Found an exact template match 3sn6_N. I0517 16:48:56.990272 140650292352064 run_docker.py:255] I0517 08:48:56.989793 139962212468544 templates.py:267] Found an exact template match 6pb1_N. I0517 16:48:57.033105 140650292352064 run_docker.py:255] I0517 08:48:57.032745 139962212468544 templates.py:267] Found an exact template match 6rum_A. I0517 16:48:57.086215 140650292352064 run_docker.py:255] I0517 08:48:57.085903 139962212468544 templates.py:267] Found an exact template match 5wb1_A. I0517 16:48:57.110778 140650292352064 run_docker.py:255] I0517 08:48:57.110478 139962212468544 templates.py:267] Found an exact template match 5vm6_A. I0517 16:48:57.186338 140650292352064 run_docker.py:255] I0517 08:48:57.186022 139962212468544 templates.py:267] Found an exact template match 5foj_A. I0517 16:48:57.237214 140650292352064 run_docker.py:255] I0517 08:48:57.236900 139962212468544 templates.py:267] Found an exact template match 5m2w_A. I0517 16:48:57.284224 140650292352064 run_docker.py:255] I0517 08:48:57.283923 139962212468544 templates.py:267] Found an exact template match 5mje_B. I0517 16:48:57.479903 140650292352064 run_docker.py:255] I0517 08:48:57.479458 139962212468544 templates.py:267] Found an exact template match 5vm4_L. I0517 16:48:57.845450 140650292352064 run_docker.py:255] I0517 08:48:57.844991 139962212468544 templates.py:267] Found an exact template match 4cdg_D. I0517 16:48:57.887320 140650292352064 run_docker.py:255] I0517 08:48:57.887031 139962212468544 templates.py:267] Found an exact template match 4gft_B. I0517 16:48:57.988736 140650292352064 run_docker.py:255] I0517 08:48:57.988251 139962212468544 pipeline.py:234] Uniref90 MSA size: 10000 sequences. I0517 16:48:57.988856 140650292352064 run_docker.py:255] I0517 08:48:57.988335 139962212468544 pipeline.py:235] BFD MSA size: 1612 sequences. I0517 16:48:57.988883 140650292352064 run_docker.py:255] I0517 08:48:57.988350 139962212468544 pipeline.py:236] MGnify MSA size: 501 sequences. I0517 16:48:57.988906 140650292352064 run_docker.py:255] I0517 08:48:57.988364 139962212468544 pipeline.py:237] Final (deduplicated) MSA size: 12020 sequences. I0517 16:48:57.988928 140650292352064 run_docker.py:255] I0517 08:48:57.988502 139962212468544 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20. I0517 16:48:58.114603 140650292352064 run_docker.py:255] I0517 08:48:58.113753 139962212468544 run_alphafold.py:191] Running model model_1_pred_0 on 8GZ6 I0517 16:48:59.508914 140650292352064 run_docker.py:255] I0517 08:48:59.508350 139962212468544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 126), 'residue_index': (4, 126), 'seq_length': (4,), 'template_aatype': (4, 4, 126), 'template_all_atom_masks': (4, 4, 126, 37), 'template_all_atom_positions': (4, 4, 126, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 126), 'msa_mask': (4, 508, 126), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 126, 3), 'template_pseudo_beta_mask': (4, 4, 126), 'atom14_atom_exists': (4, 126, 14), 'residx_atom14_to_atom37': (4, 126, 14), 'residx_atom37_to_atom14': (4, 126, 37), 'atom37_atom_exists': (4, 126, 37), 'extra_msa': (4, 5120, 126), 'extra_msa_mask': (4, 5120, 126), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 126), 'true_msa': (4, 508, 126), 'extra_has_deletion': (4, 5120, 126), 'extra_deletion_value': (4, 5120, 126), 'msa_feat': (4, 508, 126, 49), 'target_feat': (4, 126, 22)} I0517 16:48:59.600997 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600504: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9 I0517 16:48:59.601204 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600544: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas I0517 16:48:59.608552 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608081: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found I0517 16:48:59.608791 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608157: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function I0517 16:48:59.612349 140650292352064 run_docker.py:255] Traceback (most recent call last): I0517 16:48:59.612429 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 432, in <module> I0517 16:48:59.612536 140650292352064 run_docker.py:255] app.run(main) I0517 16:48:59.612603 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run I0517 16:48:59.612677 140650292352064 run_docker.py:255] _run_main(main, args) I0517 16:48:59.612745 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main I0517 16:48:59.612811 140650292352064 run_docker.py:255] sys.exit(main(argv)) I0517 16:48:59.612883 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 408, in main I0517 16:48:59.612949 140650292352064 run_docker.py:255] predict_structure( I0517 16:48:59.613012 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure I0517 16:48:59.613077 140650292352064 run_docker.py:255] prediction_result = model_runner.predict(processed_feature_dict, I0517 16:48:59.613144 140650292352064 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict I0517 16:48:59.613205 140650292352064 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) I0517 16:48:59.613268 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey I0517 16:48:59.613330 140650292352064 run_docker.py:255] key = prng.seed_with_impl(impl, seed) I0517 16:48:59.613391 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl I0517 16:48:59.613450 140650292352064 run_docker.py:255] return random_seed(seed, impl=impl) I0517 16:48:59.613508 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed I0517 16:48:59.613569 140650292352064 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl) I0517 16:48:59.613629 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.613687 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.613749 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.613809 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.613869 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.613931 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.613995 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl I0517 16:48:59.614058 140650292352064 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl) I0517 16:48:59.614119 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base I0517 16:48:59.614181 140650292352064 run_docker.py:255] return seed(seeds) I0517 16:48:59.614244 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed I0517 16:48:59.614307 140650292352064 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32))) I0517 16:48:59.614370 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical I0517 16:48:59.614432 140650292352064 run_docker.py:255] return shift_right_logical_p.bind(x, y) I0517 16:48:59.614495 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.614555 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.614619 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.614684 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.614765 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.614831 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.614899 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive I0517 16:48:59.614965 140650292352064 run_docker.py:255] return compiled_fun(*args) I0517 16:48:59.615031 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 200, in <lambda> I0517 16:48:59.615100 140650292352064 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0] I0517 16:48:59.615169 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled I0517 16:48:59.615240 140650292352064 run_docker.py:255] out_flat = compiled.execute(in_flat) I0517 16:48:59.615311 140650292352064 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

I got exactly same ERROR. Have you solved it, bro? My nividia info is as follow: Fri Nov 17 14:21:52 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | Off | | 0% 41C P8 28W / 450W | 6MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3164 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------+

Unfortunately, I haven't solved it. Recently, I got the same error as #853. I tried to change the Dockerfile, followed #764 (comment), but it didn't work and caused a crash.

I solved the "chunked” ERROR by upgrading docker, maybe you can try it. But for jaxlib error, I am confused.

ChengkuiZhao avatar Nov 17 '23 07:11 ChengkuiZhao

I tried to run AlphaFold latest version on a new machine with GPU RTX4090, CUDA version 11.8 (downgraded from 12.2), Ubuntu 22.04 LTS. I used anaconda3 to build the alphafold environment. However, all of the alphafold version >=2.24 showed the same error.

I was wondering if anyone could help me to solve this? Please let me know if you need anything else. Thank you soooo much!

I've tried this method #646 , but it didn't work.

Nividia info: `(alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ nvidia-smi Wed May 17 16:49:49 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA Graphics... On | 00000000:01:00.0 Off | Off | | 30% 42C P8 24W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA Graphics... On | 00000000:08:00.0 Off | Off | | 30% 41C P8 19W / 350W | 1MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ `

Error message: (alphafold2.3.1) soft@GPU1:/soft/alphafold-2.3.1$ python docker/run_docker.py --fasta_paths=/home/soft/Documents/8GZ6.fasta --max_template_date=3000-01-01 --data_dir=/soft/AF2/download/ --output_dir=/home/soft/Documents/8GZ6/ I0517 16:41:16.903001 140650292352064 run_docker.py:113] Mounting /home/soft/Documents -> /mnt/fasta_path_0 I0517 16:41:16.903073 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref90 -> /mnt/uniref90_database_path I0517 16:41:16.903110 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/mgnify -> /mnt/mgnify_database_path I0517 16:41:16.903137 140650292352064 run_docker.py:113] Mounting /soft/AF2/download -> /mnt/data_dir I0517 16:41:16.903162 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir I0517 16:41:16.903189 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb_mmcif -> /mnt/obsolete_pdbs_path I0517 16:41:16.903218 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/pdb70 -> /mnt/pdb70_database_path I0517 16:41:16.903246 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/uniref30 -> /mnt/uniref30_database_path I0517 16:41:16.903274 140650292352064 run_docker.py:113] Mounting /soft/AF2/download/bfd -> /mnt/bfd_database_path I0517 16:41:18.367146 140650292352064 run_docker.py:255] I0517 08:41:18.366440 139962212468544 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat. I0517 16:41:18.513591 140650292352064 run_docker.py:255] I0517 08:41:18.513257 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0517 16:41:18.642552 140650292352064 run_docker.py:255] I0517 08:41:18.642126 139962212468544 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA I0517 16:41:18.642671 140650292352064 run_docker.py:255] I0517 08:41:18.642353 139962212468544 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' I0517 16:41:18.642703 140650292352064 run_docker.py:255] I0517 08:41:18.642385 139962212468544 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. I0517 16:41:20.699224 140650292352064 run_docker.py:255] I0517 08:41:20.698850 139962212468544 run_alphafold.py:386] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0517 16:41:20.699338 140650292352064 run_docker.py:255] I0517 08:41:20.698937 139962212468544 run_alphafold.py:403] Using random seed 979532966947835319 for the data pipeline I0517 16:41:20.699380 140650292352064 run_docker.py:255] I0517 08:41:20.699029 139962212468544 run_alphafold.py:161] Predicting 8GZ6 I0517 16:41:20.699409 140650292352064 run_docker.py:255] I0517 08:41:20.699223 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpxzhf05lq/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/uniref90_database_path/uniref90.fasta" I0517 16:41:20.758846 140650292352064 run_docker.py:255] I0517 08:41:20.758264 139962212468544 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0517 16:43:32.400591 140650292352064 run_docker.py:255] I0517 08:43:32.399554 139962212468544 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 131.641 seconds I0517 16:43:32.550053 140650292352064 run_docker.py:255] I0517 08:43:32.549127 139962212468544 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp05y00c2a/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/8GZ6.fasta /mnt/mgnify_database_path/mgy_clusters_2022_05.fa" I0517 16:43:32.603918 140650292352064 run_docker.py:255] I0517 08:43:32.603443 139962212468544 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query I0517 16:46:37.409147 140650292352064 run_docker.py:255] I0517 08:46:37.407801 139962212468544 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 184.804 seconds I0517 16:46:37.951393 140650292352064 run_docker.py:255] I0517 08:46:37.950919 139962212468544 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpp63aisip/query.a3m -o /tmp/tmpp63aisip/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70" I0517 16:46:38.000887 140650292352064 run_docker.py:255] I0517 08:46:38.000401 139962212468544 utils.py:36] Started HHsearch query I0517 16:46:49.137153 140650292352064 run_docker.py:255] I0517 08:46:49.136655 139962212468544 utils.py:40] Finished HHsearch query in 11.136 seconds I0517 16:46:49.446268 140650292352064 run_docker.py:255] I0517 08:46:49.445886 139962212468544 hhblits.py:128] Launching subprocess "/usr/bin/hhblits -i /mnt/fasta_path_0/8GZ6.fasta -cpu 4 -oa3m /tmp/tmpzq5150hf/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /mnt/bfd_database_path/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /mnt/uniref30_database_path/UniRef30_2021_03" I0517 16:46:49.497454 140650292352064 run_docker.py:255] I0517 08:46:49.497032 139962212468544 utils.py:36] Started HHblits query I0517 16:48:52.683240 140650292352064 run_docker.py:255] I0517 08:48:52.682735 139962212468544 utils.py:40] Finished HHblits query in 123.186 seconds I0517 16:48:52.695582 140650292352064 run_docker.py:255] I0517 08:48:52.695324 139962212468544 templates.py:878] Searching for template for: QVQLQESGGGLVQAGGSLRLSCAASGRTSSVYNMAWFRQTPGKEREFVAAITGNGGTTLYADSVKGRLTISRGNAKNTVSLQMNVLKPDDTAVYYCAAGGWGKERNYAYWGQGTQVTVSSHHHHHH I0517 16:48:54.830226 140650292352064 run_docker.py:255] I0517 08:48:54.829735 139962212468544 templates.py:267] Found an exact template match 6qd6_C. I0517 16:48:54.838274 140650292352064 run_docker.py:255] I0517 08:48:54.837990 139962212468544 templates.py:267] Found an exact template match 6qd6_G. I0517 16:48:54.930829 140650292352064 run_docker.py:255] I0517 08:48:54.930498 139962212468544 templates.py:267] Found an exact template match 5wts_A. I0517 16:48:55.026873 140650292352064 run_docker.py:255] I0517 08:48:55.026528 139962212468544 templates.py:267] Found an exact template match 6gjs_B. I0517 16:48:55.637419 140650292352064 run_docker.py:255] I0517 08:48:55.636942 139962212468544 templates.py:267] Found an exact template match 6gkd_B. I0517 16:48:55.895157 140650292352064 run_docker.py:255] I0517 08:48:55.894784 139962212468544 templates.py:267] Found an exact template match 6hd8_A. I0517 16:48:56.275886 140650292352064 run_docker.py:255] I0517 08:48:56.275430 139962212468544 templates.py:267] Found an exact template match 6hd9_A. I0517 16:48:56.319466 140650292352064 run_docker.py:255] I0517 08:48:56.319131 139962212468544 templates.py:267] Found an exact template match 6rul_A. I0517 16:48:56.451577 140650292352064 run_docker.py:255] I0517 08:48:56.451225 139962212468544 templates.py:267] Found an exact template match 4pfe_A. I0517 16:48:56.701474 140650292352064 run_docker.py:255] I0517 08:48:56.701014 139962212468544 templates.py:267] Found an exact template match 3sn6_N. I0517 16:48:56.990272 140650292352064 run_docker.py:255] I0517 08:48:56.989793 139962212468544 templates.py:267] Found an exact template match 6pb1_N. I0517 16:48:57.033105 140650292352064 run_docker.py:255] I0517 08:48:57.032745 139962212468544 templates.py:267] Found an exact template match 6rum_A. I0517 16:48:57.086215 140650292352064 run_docker.py:255] I0517 08:48:57.085903 139962212468544 templates.py:267] Found an exact template match 5wb1_A. I0517 16:48:57.110778 140650292352064 run_docker.py:255] I0517 08:48:57.110478 139962212468544 templates.py:267] Found an exact template match 5vm6_A. I0517 16:48:57.186338 140650292352064 run_docker.py:255] I0517 08:48:57.186022 139962212468544 templates.py:267] Found an exact template match 5foj_A. I0517 16:48:57.237214 140650292352064 run_docker.py:255] I0517 08:48:57.236900 139962212468544 templates.py:267] Found an exact template match 5m2w_A. I0517 16:48:57.284224 140650292352064 run_docker.py:255] I0517 08:48:57.283923 139962212468544 templates.py:267] Found an exact template match 5mje_B. I0517 16:48:57.479903 140650292352064 run_docker.py:255] I0517 08:48:57.479458 139962212468544 templates.py:267] Found an exact template match 5vm4_L. I0517 16:48:57.845450 140650292352064 run_docker.py:255] I0517 08:48:57.844991 139962212468544 templates.py:267] Found an exact template match 4cdg_D. I0517 16:48:57.887320 140650292352064 run_docker.py:255] I0517 08:48:57.887031 139962212468544 templates.py:267] Found an exact template match 4gft_B. I0517 16:48:57.988736 140650292352064 run_docker.py:255] I0517 08:48:57.988251 139962212468544 pipeline.py:234] Uniref90 MSA size: 10000 sequences. I0517 16:48:57.988856 140650292352064 run_docker.py:255] I0517 08:48:57.988335 139962212468544 pipeline.py:235] BFD MSA size: 1612 sequences. I0517 16:48:57.988883 140650292352064 run_docker.py:255] I0517 08:48:57.988350 139962212468544 pipeline.py:236] MGnify MSA size: 501 sequences. I0517 16:48:57.988906 140650292352064 run_docker.py:255] I0517 08:48:57.988364 139962212468544 pipeline.py:237] Final (deduplicated) MSA size: 12020 sequences. I0517 16:48:57.988928 140650292352064 run_docker.py:255] I0517 08:48:57.988502 139962212468544 pipeline.py:239] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 20. I0517 16:48:58.114603 140650292352064 run_docker.py:255] I0517 08:48:58.113753 139962212468544 run_alphafold.py:191] Running model model_1_pred_0 on 8GZ6 I0517 16:48:59.508914 140650292352064 run_docker.py:255] I0517 08:48:59.508350 139962212468544 model.py:165] Running predict with shape(feat) = {'aatype': (4, 126), 'residue_index': (4, 126), 'seq_length': (4,), 'template_aatype': (4, 4, 126), 'template_all_atom_masks': (4, 4, 126, 37), 'template_all_atom_positions': (4, 4, 126, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 126), 'msa_mask': (4, 508, 126), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 126, 3), 'template_pseudo_beta_mask': (4, 4, 126), 'atom14_atom_exists': (4, 126, 14), 'residx_atom14_to_atom37': (4, 126, 14), 'residx_atom37_to_atom14': (4, 126, 37), 'atom37_atom_exists': (4, 126, 37), 'extra_msa': (4, 5120, 126), 'extra_msa_mask': (4, 5120, 126), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 126), 'true_msa': (4, 508, 126), 'extra_has_deletion': (4, 5120, 126), 'extra_deletion_value': (4, 5120, 126), 'msa_feat': (4, 508, 126, 49), 'target_feat': (4, 126, 22)} I0517 16:48:59.600997 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600504: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9 I0517 16:48:59.601204 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.600544: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas I0517 16:48:59.608552 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608081: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found I0517 16:48:59.608791 140650292352064 run_docker.py:255] 2023-05-17 08:48:59.608157: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function I0517 16:48:59.612349 140650292352064 run_docker.py:255] Traceback (most recent call last): I0517 16:48:59.612429 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 432, in <module> I0517 16:48:59.612536 140650292352064 run_docker.py:255] app.run(main) I0517 16:48:59.612603 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run I0517 16:48:59.612677 140650292352064 run_docker.py:255] _run_main(main, args) I0517 16:48:59.612745 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main I0517 16:48:59.612811 140650292352064 run_docker.py:255] sys.exit(main(argv)) I0517 16:48:59.612883 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 408, in main I0517 16:48:59.612949 140650292352064 run_docker.py:255] predict_structure( I0517 16:48:59.613012 140650292352064 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure I0517 16:48:59.613077 140650292352064 run_docker.py:255] prediction_result = model_runner.predict(processed_feature_dict, I0517 16:48:59.613144 140650292352064 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict I0517 16:48:59.613205 140650292352064 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat) I0517 16:48:59.613268 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey I0517 16:48:59.613330 140650292352064 run_docker.py:255] key = prng.seed_with_impl(impl, seed) I0517 16:48:59.613391 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl I0517 16:48:59.613450 140650292352064 run_docker.py:255] return random_seed(seed, impl=impl) I0517 16:48:59.613508 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed I0517 16:48:59.613569 140650292352064 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl) I0517 16:48:59.613629 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.613687 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.613749 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.613809 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.613869 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.613931 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.613995 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl I0517 16:48:59.614058 140650292352064 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl) I0517 16:48:59.614119 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base I0517 16:48:59.614181 140650292352064 run_docker.py:255] return seed(seeds) I0517 16:48:59.614244 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed I0517 16:48:59.614307 140650292352064 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32))) I0517 16:48:59.614370 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical I0517 16:48:59.614432 140650292352064 run_docker.py:255] return shift_right_logical_p.bind(x, y) I0517 16:48:59.614495 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind I0517 16:48:59.614555 140650292352064 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params) I0517 16:48:59.614619 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace I0517 16:48:59.614684 140650292352064 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params) I0517 16:48:59.614765 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive I0517 16:48:59.614831 140650292352064 run_docker.py:255] return primitive.impl(*tracers, **params) I0517 16:48:59.614899 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive I0517 16:48:59.614965 140650292352064 run_docker.py:255] return compiled_fun(*args) I0517 16:48:59.615031 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 200, in <lambda> I0517 16:48:59.615100 140650292352064 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0] I0517 16:48:59.615169 140650292352064 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled I0517 16:48:59.615240 140650292352064 run_docker.py:255] out_flat = compiled.execute(in_flat) I0517 16:48:59.615311 140650292352064 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

BRO, I just made it with the CUDA==11.8.0, and nvidia/cuda:${CUDA}-cudnn8-devel-ubuntu22.04 ! Maybe you can try this config as well.

ChengkuiZhao avatar Nov 17 '23 10:11 ChengkuiZhao

Same jaxlib.xla_extension.XlaRuntimeError: FAILED_PRECONDITION: Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version error

conda install -c nvidia cuda-nvcc

RGardenia avatar May 20 '24 13:05 RGardenia