openfold
openfold copied to clipboard
installation issue with cuda 12
I have tried several permutations to get openfold to install on my local machine, but no joy up to this point. Could use some help, as I need to install openfold as a dependency for a couple of other codes (in particular DiffDock-L). Here is my GPU, driver, and cuda:
nvidia-smi
Mon Oct 14 16:54:16 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 ... Off | 00000000:01:00.0 On | N/A |
| N/A 41C P8 15W / 80W | 59MiB / 6144MiB | 14% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1382 G /usr/lib/xorg/Xorg 55MiB |
+---------------------------------------------------------------------------------------+
My v12 of gcc/g++/gfortran on my OS is 12.4 -- I believe that 12.2 is the highest version supported by cuda 12.1/2, but 12.4 is what is included in my Debian testing repos.
My packages for the openfold environment, pulled from the pl_upgrades branch to be able to utilize pytorch v2 and cuda 12:
conda list
# packages in environment at /media/Data/binaries/miniconda3/envs/openfold:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_kmp_llvm conda-forge
absl-py 2.1.0 pyhd8ed1ab_0 conda-forge
annotated-types 0.7.0 pypi_0 pypi
appdirs 1.4.4 pyh9f0ad1d_0 conda-forge
aria2 1.37.0 hbc8128a_2 conda-forge
aws-c-auth 0.7.26 hc36b679_2 conda-forge
aws-c-cal 0.7.4 h2abdd08_0 conda-forge
aws-c-common 0.9.27 h4bc722e_0 conda-forge
aws-c-compression 0.2.19 haa50ccc_0 conda-forge
aws-c-event-stream 0.4.3 h570d160_0 conda-forge
aws-c-http 0.8.8 h9b61739_1 conda-forge
aws-c-io 0.14.18 h49c7fd3_7 conda-forge
aws-c-mqtt 0.10.4 h5c8269d_18 conda-forge
aws-c-s3 0.6.4 h77088c0_11 conda-forge
aws-c-sdkutils 0.1.19 h038f3f9_2 conda-forge
aws-checksums 0.1.18 h038f3f9_10 conda-forge
awscli 2.18.3 py310hff52083_0 conda-forge
awscrt 0.21.2 py310h95a9d59_15 conda-forge
biopython 1.84 py310hc51659f_0 conda-forge
blas 2.116 mkl conda-forge
blas-devel 3.9.0 16_linux64_mkl conda-forge
brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge
bzip2 1.0.8 h4bc722e_7 conda-forge
c-ares 1.33.1 heb4867d_0 conda-forge
ca-certificates 2024.8.30 hbcca054_0 conda-forge
certifi 2024.8.30 pyhd8ed1ab_0 conda-forge
cffi 1.17.0 py310h2fdcea3_0 conda-forge
charset-normalizer 3.4.0 pyhd8ed1ab_0 conda-forge
click 8.1.7 unix_pyh707e725_0 conda-forge
colorama 0.4.6 pyhd8ed1ab_0 conda-forge
contextlib2 21.6.0 pyhd8ed1ab_0 conda-forge
cryptography 40.0.2 py310h34c0648_0 conda-forge
cuda-cudart 12.1.105 0 nvidia
cuda-cupti 12.1.105 0 nvidia
cuda-libraries 12.1.0 0 nvidia
cuda-nvrtc 12.1.105 0 nvidia
cuda-nvtx 12.1.105 0 nvidia
cuda-opencl 12.4.127 0 nvidia
cuda-runtime 12.1.0 0 nvidia
cudatoolkit 11.8.0 h4ba93d1_13 conda-forge
deepspeed 0.12.4 pypi_0 pypi
distro 1.8.0 pyhd8ed1ab_0 conda-forge
dllogger 1.0.0 pypi_0 pypi
dm-tree 0.1.6 pypi_0 pypi
docker-pycreds 0.4.0 py_0 conda-forge
docutils 0.19 py310hff52083_1 conda-forge
einops 0.8.0 pypi_0 pypi
fftw 3.3.10 nompi_hf1063bd_110 conda-forge
filelock 3.16.1 pyhd8ed1ab_0 conda-forge
flash-attn 2.6.3 pypi_0 pypi
fsspec 2024.9.0 pyhff2d567_0 conda-forge
git 2.46.0 pl5321hb5640b7_0 conda-forge
gitdb 4.0.11 pyhd8ed1ab_0 conda-forge
gitpython 3.1.43 pyhd8ed1ab_0 conda-forge
gmp 6.3.0 hac33072_2 conda-forge
gmpy2 2.1.5 py310hc7909c9_1 conda-forge
hhsuite 3.3.0 py310pl5321hc31ed2c_12 bioconda
hjson 3.1.0 pypi_0 pypi
hmmer 3.4 hdbdd923_2 bioconda
icu 75.1 he02047a_0 conda-forge
idna 3.10 pyhd8ed1ab_0 conda-forge
ihm 1.3 py310h5b4e0ec_0 conda-forge
jinja2 3.1.4 pyhd8ed1ab_0 conda-forge
jmespath 1.0.1 pyhd8ed1ab_0 conda-forge
kalign2 2.04 h031d066_7 bioconda
keyutils 1.6.1 h166bdaf_0 conda-forge
krb5 1.21.3 h659f571_0 conda-forge
ld_impl_linux-64 2.43 h712a8e2_1 conda-forge
libabseil 20240116.2 cxx17_he02047a_1 conda-forge
libblas 3.9.0 16_linux64_mkl conda-forge
libcblas 3.9.0 16_linux64_mkl conda-forge
libcublas 12.1.0.26 0 nvidia
libcufft 11.0.2.4 0 nvidia
libcufile 1.9.1.3 0 nvidia
libcurand 10.3.5.147 0 nvidia
libcurl 8.9.1 hdb1bdb2_0 conda-forge
libcusolver 11.4.4.55 0 nvidia
libcusparse 12.0.2.55 0 nvidia
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 hd590300_2 conda-forge
libexpat 2.6.2 h59595ed_0 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc 7.2.0 h69d50b8_2 conda-forge
libgcc-ng 14.1.0 h77fa898_0 conda-forge
libgfortran-ng 14.1.0 h69a702a_0 conda-forge
libgfortran5 14.1.0 hc5f4f2c_0 conda-forge
libhwloc 2.11.1 default_hecaa2ac_1000 conda-forge
libiconv 1.17 hd590300_2 conda-forge
liblapack 3.9.0 16_linux64_mkl conda-forge
liblapacke 3.9.0 16_linux64_mkl conda-forge
libnghttp2 1.58.0 h47da74e_1 conda-forge
libnpp 12.0.2.50 0 nvidia
libnsl 2.0.1 hd590300_0 conda-forge
libnvjitlink 12.1.105 0 nvidia
libnvjpeg 12.1.1.14 0 nvidia
libprotobuf 4.25.3 h08a7969_0 conda-forge
libsqlite 3.46.0 hde9e2c9_0 conda-forge
libssh2 1.11.0 h0841786_0 conda-forge
libstdcxx-ng 14.1.0 hc0a3c3a_0 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libxml2 2.12.7 he7c6b58_4 conda-forge
libzlib 1.3.1 h4ab18f5_1 conda-forge
lightning-utilities 0.11.7 pyhd8ed1ab_0 conda-forge
llvm-openmp 15.0.7 h0cdce71_0 conda-forge
markupsafe 2.1.5 py310h2372a71_0 conda-forge
mkl 2022.1.0 h84fe81f_915 conda-forge
mkl-devel 2022.1.0 ha770c72_916 conda-forge
mkl-include 2022.1.0 h84fe81f_915 conda-forge
ml-collections 0.1.1 pyhd8ed1ab_0 conda-forge
modelcif 0.7 pyhd8ed1ab_0 conda-forge
mpc 1.3.1 h24ddda3_0 conda-forge
mpfr 4.2.1 h38ae2d0_2 conda-forge
mpmath 1.3.0 pyhd8ed1ab_0 conda-forge
msgpack-python 1.0.8 py310h25c7140_0 conda-forge
ncurses 6.5 he02047a_1 conda-forge
networkx 3.3 pyhd8ed1ab_1 conda-forge
ninja 1.11.1.1 pypi_0 pypi
numpy 1.26.0 py310hb13e2d6_0 conda-forge
ocl-icd 2.3.2 hd590300_1 conda-forge
ocl-icd-system 1.0.0 1 conda-forge
openmm 7.7.0 py310hccf1d78_1 conda-forge
openssl 3.3.1 hb9d3cd8_3 conda-forge
packaging 24.1 pyhd8ed1ab_0 conda-forge
pandas 2.2.2 py310hf9f9076_1 conda-forge
pcre2 10.44 hba22ea6_2 conda-forge
pdbfixer 1.8.1 pyh6c4a22f_0 conda-forge
perl 5.32.1 7_hd590300_perl5 conda-forge
pip 24.2 pyh8b19718_1 conda-forge
platformdirs 4.3.6 pyhd8ed1ab_0 conda-forge
prompt-toolkit 3.0.38 pyha770c72_0 conda-forge
prompt_toolkit 3.0.38 hd8ed1ab_0 conda-forge
protobuf 4.25.3 py310ha8c1f0e_0 conda-forge
psutil 6.0.0 py310hc51659f_0 conda-forge
py-cpuinfo 9.0.0 pypi_0 pypi
pycparser 2.22 pyhd8ed1ab_0 conda-forge
pydantic 2.9.2 pypi_0 pypi
pydantic-core 2.23.4 pypi_0 pypi
pynvml 11.5.3 pypi_0 pypi
pysocks 1.7.1 pyha2e5f31_6 conda-forge
python 3.10.14 hd12c33a_0_cpython conda-forge
python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge
python-tzdata 2024.2 pyhd8ed1ab_0 conda-forge
python_abi 3.10 5_cp310 conda-forge
pytorch 2.1.2 py3.10_cuda12.1_cudnn8.9.2_0 pytorch
pytorch-cuda 12.1 ha16c6d3_5 pytorch
pytorch-lightning 2.4.0 pyhd8ed1ab_0 conda-forge
pytorch-mutex 1.0 cuda pytorch
pytz 2024.2 pyhd8ed1ab_0 conda-forge
pyyaml 5.4.1 py310h5764c6d_4 conda-forge
readline 8.2 h8228510_1 conda-forge
requests 2.32.3 pyhd8ed1ab_0 conda-forge
ruamel.yaml 0.17.21 py310h1fa729e_3 conda-forge
ruamel.yaml.clib 0.2.8 py310h2372a71_0 conda-forge
s2n 1.5.1 h3400bea_0 conda-forge
scipy 1.14.1 py310ha3fb0e1_0 conda-forge
sentry-sdk 2.16.0 pyhd8ed1ab_0 conda-forge
setproctitle 1.3.3 py310h2372a71_0 conda-forge
setuptools 59.5.0 py310hff52083_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
smmap 5.0.0 pyhd8ed1ab_0 conda-forge
sympy 1.13.3 pypyh2585a3b_103 conda-forge
tbb 2021.12.0 h434a139_3 conda-forge
tk 8.6.13 noxft_h4845f30_101 conda-forge
torchmetrics 1.4.2 pyhd8ed1ab_0 conda-forge
torchtriton 2.1.0 py310 pytorch
tqdm 4.62.2 pyhd8ed1ab_0 conda-forge
typing-extensions 4.12.2 hd8ed1ab_0 conda-forge
typing_extensions 4.12.2 pyha770c72_0 conda-forge
tzdata 2024b hc8b5060_0 conda-forge
urllib3 1.26.19 pyhd8ed1ab_0 conda-forge
wandb 0.16.6 pyhd8ed1ab_1 conda-forge
wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge
wheel 0.44.0 pyhd8ed1ab_0 conda-forge
xz 5.2.6 h166bdaf_0 conda-forge
yaml 0.2.5 h7f98852_2 conda-forge
zstd 1.5.6 ha6fb4c9_0 conda-forge
During installation of 3rd-party dependencies, I get the following output, indicating that the dependencies did not install (setup.py install is part of this process and failed to run):
./scripts/install_third_party_dependencies.sh
--2024-10-14 16:41:09-- https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereo_chemical_props.txt
Resolving git.scicore.unibas.ch (git.scicore.unibas.ch)... 131.152.229.50
Connecting to git.scicore.unibas.ch (git.scicore.unibas.ch)|131.152.229.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9119 (8.9K) [text/plain]
Saving to: ‘openfold/resources/stereo_chemical_props.txt’
stereo_chemical_props.txt 100%[=================================================================================================>] 8.91K --.-KB/s in 0.001s
Last-modified header missing -- time-stamps turned off.
2024-10-14 16:41:10 (7.15 MB/s) - ‘openfold/resources/stereo_chemical_props.txt’ saved [9119/9119]
running install
/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/easy_install.py:156: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing openfold.egg-info/PKG-INFO
writing dependency_links to openfold.egg-info/dependency_links.txt
writing top-level names to openfold.egg-info/top_level.txt
reading manifest file 'openfold.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'openfold.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying openfold/resources/stereo_chemical_props.txt -> build/lib.linux-x86_64-3.10/openfold/resources
running build_ext
/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.1
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'attn_core_inplace_cuda' extension
Emitting ninja build file /media/Data/binaries/github/openfold-pl_upgrades/build/temp.linux-x86_64-3.10/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /usr/bin/nvcc -I/media/Data/binaries/github/openfold-pl_upgrades/openfold/utils/kernel/csrc/ -I/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include -I/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/TH -I/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/THC -I/media/Data/binaries/miniconda3/envs/openfold/include/python3.10 -c -c /media/Data/binaries/github/openfold-pl_upgrades/openfold/utils/kernel/csrc/softmax_cuda_kernel.cu -o /media/Data/binaries/github/openfold-pl_upgrades/build/temp.linux-x86_64-3.10/openfold/utils/kernel/csrc/softmax_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++17 -maxrregcount=50 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=attn_core_inplace_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /media/Data/binaries/github/openfold-pl_upgrades/build/temp.linux-x86_64-3.10/openfold/utils/kernel/csrc/softmax_cuda_kernel.o
/usr/bin/nvcc -I/media/Data/binaries/github/openfold-pl_upgrades/openfold/utils/kernel/csrc/ -I/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include -I/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/TH -I/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/THC -I/media/Data/binaries/miniconda3/envs/openfold/include/python3.10 -c -c /media/Data/binaries/github/openfold-pl_upgrades/openfold/utils/kernel/csrc/softmax_cuda_kernel.cu -o /media/Data/binaries/github/openfold-pl_upgrades/build/temp.linux-x86_64-3.10/openfold/utils/kernel/csrc/softmax_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++17 -maxrregcount=50 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=attn_core_inplace_cuda -D_GLIBCXX_USE_CXX11_ABI=0
/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected template-name before ‘<’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
| ^
/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:120: error: expected identifier before ‘<’ token
/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:123: error: expected primary-expression before ‘>’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
| ^
/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/include/pybind11/detail/../cast.h:45:126: error: expected primary-expression before ‘)’ token
45 | return caster.operator typename make_caster<T>::template cast_op_type<T>();
| ^
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build
subprocess.run(
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/media/Data/binaries/github/openfold-pl_upgrades/setup.py", line 113, in <module>
setup(
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/core.py", line 148, in setup
dist.run_commands()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/install.py", line 74, in run
self.do_egg_install()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/install.py", line 116, in do_egg_install
self.run_command('bdist_egg')
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions
build_ext.build_extensions(self)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension
objects = self.compiler.compile(sources,
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/media/Data/binaries/miniconda3/envs/openfold/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Download CUTLASS, required for Deepspeed Evoformer attention kernel
Cloning into 'cutlass'...
remote: Enumerating objects: 6103, done.
remote: Counting objects: 100% (6103/6103), done.
remote: Compressing objects: 100% (1797/1797), done.
remote: Total 6103 (delta 3528), reused 4982 (delta 3018), pack-reused 0 (from 0)
Receiving objects: 100% (6103/6103), 27.71 MiB | 4.72 MiB/s, done.
Resolving deltas: 100% (3528/3528), done.
To make your changes take effect please reactivate your environment
To make your changes take effect please reactivate your environment
This is where I am stuck -- don't really know what to do with the "Error compiling objects for extension". I have already looked at #403 , #462 , and #477 and have done my best to implement their suggestions, but obviously do not have a fully working environment.
@blakemertz Are you sure you are using your OS's gcc? Could you please activate your environment and try which gcc ? And gcc -v ? And should the version happen to be 13.3, could you please try mamba install gcc=12.4 ? This fixed it for me.
@vaclavhanzl thanks for responding. My OS gcc is v 12 -- I specifically deleted the existing symlink to gcc14 and recreated it to gcc12, checking with gcc -v in both my OS and in my openfold environment. I will double-check again and also try installing gcc=12.4 with mamba and let you know if that fixes the issue.
@blakemertz Please try this environment from my PR #496
@vaclavhanzl thanks for sharing. I noticed you are using your own cuda tools (not included in environment.yml). Are you installing from your Debian repositories or pulling them from the nvidia channel in conda?
Update: never mind, I saw that it pulled in cudatoolkit (v 11.8) when I created the environment.
@vaclavhanzl thanks again for all your help. My guess is that the dependencies b/t gcc, numpy < 2, and pytorch w/CUDA 12 were making my original environment break. This was a time-consuming task on your part -- much appreciated.
While running the unit test after setting up the environment, I had 8 failed tests and had to modify two of the python scripts in the test directory as per #467 to reduce the number of failed tests to one:
./scripts/run_unit_tests.sh
[2024-10-22 21:41:10,915] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
s.................Using /home/centos/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/centos/.cache/torch_extensions/py310_cu121/evoformer_attn/build.ninja...
Building extension module evoformer_attn...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module evoformer_attn...
Time to load evoformer_attn op: 0.2760050296783447 seconds
............s...s.sss.ss.E...sssssssss.sss....ssssss..s.s.s.ss.s......s.s..ss...ss.s.s....s........
======================================================================
ERROR: test_import_jax_weights_ (tests.test_import_weights.TestImportWeights)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/shared/binaries/github/openfold/tests/test_import_weights.py", line 37, in test_import_jax_weights_
import_jax_weights_(
File "/shared/binaries/github/openfold/openfold/utils/import_weights.py", line 650, in import_jax_weights_
data = np.load(npz_path)
File "/shared/miniconda3/envs/openfold/lib/python3.10/site-packages/numpy/lib/npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/shared/binaries/github/openfold/tests/openfold/resources/params/params_model_1_ptm.npz'
----------------------------------------------------------------------
Ran 117 tests in 56.967s
FAILED (errors=1, skipped=41)
Test(s) failed. Make sure you've installed all Python dependencies.
I suppose one could explicitly point to the params_model_1_ptm.npz file by trying to pass the --jax_param_path flag, but not sure the exact syntax for that. I will consider this closed for now, hope your pull gets pushed back into the pl_upgrades branch b/c I am sure there are plenty of users rolling cuda12 and pytorch2 right now......
@blakemertz Thanks for all the tests. To answer your question (sorry, it was too late night here when I saw it), as you already noticed, most things come from the environment.yml. My latest PR #496 further limits what is used from the OS distribution - I guess it is now just the kernel module. For others coming here via searches, I'll document things in more details. To get the kernel module, I did this on my Debian testing:
apt-get install nvidia-cuda-dev nvidia-cuda-toolkit linux-image-amd64 linux-headers-amd64
while having this in /etc/apt/sources.list:
deb http://deb.debian.org/debian/ testing main contrib non-free non-free-firmware
deb-src http://deb.debian.org/debian/ testing main contrib non-free non-free-firmware
deb http://security.debian.org/debian-security testing-security main contrib non-free non-free-firmware
deb-src http://security.debian.org/debian-security testing-security main contrib non-free non-free-firmware
deb http://deb.debian.org/debian/ testing-updates main contrib non-free non-free-firmware
deb-src http://deb.debian.org/debian/ testing-updates main contrib non-free non-free-firmware
Note that I explicitly avoided anything from the Nvidia website (I appreciate their nice efforts but using just the Debian repos is much simpler).
Even my apt-get setup is probably still an overkill installing things which will not be used. All you want on the OS level is to get nvidia-smi working:
hanzl@blackbox:~$ nvidia-smi
Wed Oct 23 09:42:53 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
hanzl@blackbox:~$ cat /proc/version
Linux version 6.11.2-amd64 ([email protected]) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-6) 14.2.0, GNU ld (GNU Binutils for Debian) 2.43.1) #1 SMP PREEMPT_DYNAMIC Debian 6.11.2-1 (2024-10-05)
Using the environment with #496 applied, I get these versions:
(test_env5) hanzl@blackbox:~$ which nvcc
/home/hanzl/miniforge3/envs/test_env5/bin/nvcc
(test_env5) hanzl@blackbox:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
(test_env5) hanzl@blackbox:~$ which gcc
/home/hanzl/miniforge3/envs/test_env5/bin/gcc
(test_env5) hanzl@blackbox:~$ gcc --version
gcc (conda-forge gcc 12.4.0-0) 12.4.0
I did many desperate things in the past while trying to install OpenFold (all my other posts here are likely obsoleted by this one). If you are reading this, you likely got your share of this pain, too. I learned that apart from installing what works, even more important is uninstalling what you installed before while searching for your way. Seriously, if a clean OS install is possible for you, it is a good start. Your previous experiments likely left you in a minefield of pitfalls which make debugging OpenFold's own problems extremely hard. You may try some cleanups I did in the past:
If your monitor is NOT plugged to your GPU (and you use it just for CUDA), you may do things as drastic as:
apt-get remove 'nvidia-*' 'libnvidia-*'
etc., until dpkg -l|grep nvidia returns nothing. Maybe something similar for packages with 'cuda' in the name.
Equally important is to clean up anything python related. If you experimented with various ways to make python virtual environments, you can have nasty landmines waiting in some very obscure places, triggered for certain versions of python only. Searching for good python version in a good environment for OpenFold can be easily spoiled by this. Verify directories along the python's library import path sys.path, maybe there is part of some old torch. My ghost was hidden in /home/hanzl/.local/lib/python3.9/site-packages.
@blakemertz And for this issue 494 - I guess it should stay open until PR #496 (or something similar) is merged?
PR #496 is now merged so I think this issue could be closed (please @blakemertz - looks like I cannot do that but you could, thanks).