alphafold icon indicating copy to clipboard operation
alphafold copied to clipboard

alphafold doesn't work in RTX 4090?

Open C10H opened this issue 1 year ago • 2 comments

environment: Win11+WSL2+Ubuntu22.04 Docker for wsl 4.25.0

NVIDIA 4090,cuda12.2 conda list:

# packages in environment at /home/a22/anaconda3/envs/alphafold2-docker:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
absl-py                   1.0.0                    pypi_0    pypi
bzip2                     1.0.8                h7b6447c_0    defaults
ca-certificates           2023.08.22           h06a4308_0    defaults
certifi                   2023.11.17               pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
docker                    5.0.0                    pypi_0    pypi
idna                      3.6                      pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
libuuid                   1.41.5               h5eee18b_0    defaults
ncurses                   6.4                  h6a678d5_0    defaults
openssl                   3.0.12               h7f8727e_0    defaults
pip                       23.3.1          py311h06a4308_0    defaults
python                    3.11.5               h955ad1f_0    defaults
readline                  8.2                  h5eee18b_0    defaults
requests                  2.31.0                   pypi_0    pypi
setuptools                68.0.0          py311h06a4308_0    defaults
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    defaults
tk                        8.6.12               h1ccaba5_0    defaults
tzdata                    2023c                h04d1e81_0    defaults
urllib3                   2.1.0                    pypi_0    pypi
websocket-client          1.6.4                    pypi_0    pypi
wheel                     0.41.2          py311h06a4308_0    defaults
xz                        5.4.2                h5eee18b_0    defaults
zlib                      1.2.13               h5eee18b_0    defaults
(alphafold2-docker) a22@C10H15N:~/alphafold$ docker-compose version
Docker Compose version v2.23.0-desktop.1
(alphafold2-docker) a22@C10H15N:~/alphafold$ bash run.sh
I1130 21:29:25.508691 140208603498304 run_docker.py:116] Mounting /home/a22/alphafold/project/6y4f -> /mnt/fasta_path_0
I1130 21:29:25.508810 140208603498304 run_docker.py:116] Mounting /home/a22/afdata/uniref90 -> /mnt/uniref90_database_path
I1130 21:29:25.508872 140208603498304 run_docker.py:116] Mounting /home/a22/afdata/mgnify -> /mnt/mgnify_database_path
I1130 21:29:25.508915 140208603498304 run_docker.py:116] Mounting /home/a22/afdata -> /mnt/data_dir
I1130 21:29:25.508964 140208603498304 run_docker.py:116] Mounting /home/a22/afdata/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir
I1130 21:29:25.509018 140208603498304 run_docker.py:116] Mounting /home/a22/afdata/pdb_mmcif -> /mnt/obsolete_pdbs_path
I1130 21:29:25.509066 140208603498304 run_docker.py:116] Mounting /home/a22/afdata/pdb70 -> /mnt/pdb70_database_path
I1130 21:29:25.509122 140208603498304 run_docker.py:116] Mounting /home/a22/afdata/uniref30 -> /mnt/uniref30_database_path
I1130 21:29:25.509181 140208603498304 run_docker.py:116] Mounting /home/a22/afdata/bfd -> /mnt/bfd_database_path
I1130 21:29:26.290893 140208603498304 run_docker.py:258] /sbin/ldconfig.real: /usr/lib/x86_64-linux-gnu/libcuda.so.1 is not a symbolic link
I1130 21:29:26.290986 140208603498304 run_docker.py:258]
I1130 21:29:31.820214 140208603498304 run_docker.py:258] I1130 13:29:31.819571 139803493142784 templates.py:858] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat.
I1130 21:29:34.677552 140208603498304 run_docker.py:258] I1130 13:29:34.676982 139803493142784 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I1130 21:29:35.058179 140208603498304 run_docker.py:258] I1130 13:29:35.057624 139803493142784 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Interpreter Host CUDA
I1130 21:29:35.058630 140208603498304 run_docker.py:258] I1130 13:29:35.058142 139803493142784 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1130 21:29:35.058690 140208603498304 run_docker.py:258] I1130 13:29:35.058253 139803493142784 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I1130 21:29:35.059734 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.059441: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 103026786304 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.060191 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.059927: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 92724109312 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.060706 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.060484: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 83451699200 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.061046 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.060830: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 75106525184 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.061371 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.061158: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 67595870208 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.061750 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.061529: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 60836282368 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.062093 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.061863: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 54752653312 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.062418 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.062212: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 49277386752 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.062761 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.062543: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 44349648896 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.063068 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.062867: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 39914684416 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory
I1130 21:29:35.063122 140208603498304 run_docker.py:258] 2023-11-30 13:29:35.062887: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:767] failed to alloc 35923214336 bytes unified memory; result: CUDA_ERROR_OUT_OF_MEMORY: out of memory


C10H avatar Nov 30 '23 15:11 C10H

Old GPUs do not support Unified Memory.

Try remove these lines from run script:

          'TF_FORCE_UNIFIED_MEMORY': '1',
          'XLA_PYTHON_CLIENT_MEM_FRACTION': '4.0',

sokrypton avatar Nov 30 '23 15:11 sokrypton

Maybe the workaround in #863 can help you...

elionaimc avatar Dec 16 '23 14:12 elionaimc