Pytorch-Correlation-extension
Pytorch-Correlation-extension copied to clipboard
OSError: CUDA_HOME environment variable not set when python setup.py in Dockerfile
My Dockerfile
FROM pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime
RUN apt-get update && apt-get install -y git gcc build-essential
RUN mkdir /app
WORKDIR /app
# Install Pytorch Correlation
RUN git clone https://github.com/ClementPinard/Pytorch-Correlation-extension.git
RUN cd Pytorch-Correlation-extension && python setup.py install
RUN cd -
EXPOSE 5252
CMD ["python", "app.py"]
Then raise an Error:
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
The full error logs:
=> ERROR [13/14] RUN cd Pytorch-Correlation-extension && python setup.py install 2.2s
------
> [13/14] RUN cd Pytorch-Correlation-extension && python setup.py install:
#0 1.843 Traceback (most recent call last):
#0 1.843 File "/app/Pytorch-Correlation-extension/setup.py", line 57, in <module>
#0 1.843 launch_setup()
#0 1.844 File "/app/Pytorch-Correlation-extension/setup.py", line 36, in launch_setup
#0 1.844 Extension('spatial_correlation_sampler_backend',
#0 1.844 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
#0 1.844 library_dirs += library_paths(cuda=True)
#0 1.844 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
#0 1.845 if (not os.path.exists(_join_cuda_home(lib_dir)) and
#0 1.845 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
#0 1.845 raise EnvironmentError('CUDA_HOME environment variable is not set. '
#0 1.845 OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
------
Dockerfile:33
--------------------
31 | # Install Pytorch Correlation
32 | RUN git clone https://github.com/ClementPinard/Pytorch-Correlation-extension.git
33 | >>> RUN cd Pytorch-Correlation-extension && python setup.py install
34 | RUN cd -
35 |
--------------------
ERROR: failed to solve: process "/bin/sh -c cd Pytorch-Correlation-extension && python setup.py install" did not complete successfully: exit code: 1
Hi, looks like to met that you would need to use the devel image and not the runtime since you need to be able to compile against torch and cuda. SO I would try changing the docker image name from pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime to pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
Hi @ClementPinard Thank you for your advice
After I changed the docker image name from pytorch/pytorch:2.0.0-cuda11.7-cudnn8-runtime to pytorch/pytorch:2.0.0-cuda11.7-cudnn8-devel
The former issues fixed, but I has new issue:
=> ERROR [13/14] RUN cd Pytorch-Correlation-extension && python setup.py install 15.9s
------
> [13/14] RUN cd Pytorch-Correlation-extension && python setup.py install:
#0 1.665 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#0 1.689 running install
#0 1.689 /opt/conda/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
#0 1.689 warnings.warn(
#0 1.752 /opt/conda/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
#0 1.752 warnings.warn(
#0 1.818 running bdist_egg
#0 1.830 running egg_info
#0 1.830 creating Correlation_Module/spatial_correlation_sampler.egg-info
#0 1.835 writing Correlation_Module/spatial_correlation_sampler.egg-info/PKG-INFO
#0 1.836 writing dependency_links to Correlation_Module/spatial_correlation_sampler.egg-info/dependency_links.txt
#0 1.836 writing requirements to Correlation_Module/spatial_correlation_sampler.egg-info/requires.txt
#0 1.836 writing top-level names to Correlation_Module/spatial_correlation_sampler.egg-info/top_level.txt
#0 1.836 writing manifest file 'Correlation_Module/spatial_correlation_sampler.egg-info/SOURCES.txt'
#0 1.842 /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
#0 1.842 warnings.warn(msg.format('we could not find ninja.'))
#0 1.846 reading manifest file 'Correlation_Module/spatial_correlation_sampler.egg-info/SOURCES.txt'
#0 1.847 adding license file 'LICENSE'
#0 1.847 writing manifest file 'Correlation_Module/spatial_correlation_sampler.egg-info/SOURCES.txt'
#0 1.848 installing library code to build/bdist.linux-x86_64/egg
#0 1.848 running install_lib
#0 1.848 running build_py
#0 1.849 creating build
#0 1.849 creating build/lib.linux-x86_64-cpython-310
#0 1.849 creating build/lib.linux-x86_64-cpython-310/spatial_correlation_sampler
#0 1.849 copying Correlation_Module/spatial_correlation_sampler/spatial_correlation_sampler.py -> build/lib.linux-x86_64-cpython-310/spatial_correlation_sampler
#0 1.850 copying Correlation_Module/spatial_correlation_sampler/__init__.py -> build/lib.linux-x86_64-cpython-310/spatial_correlation_sampler
#0 1.850 running build_ext
#0 1.868 building 'spatial_correlation_sampler_backend' extension
#0 1.868 creating build/temp.linux-x86_64-cpython-310
#0 1.868 creating build/temp.linux-x86_64-cpython-310/Correlation_Module
#0 1.869 gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -DUSE_CUDA -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.10 -c Correlation_Module/correlation.cpp -o build/temp.linux-x86_64-cpython-310/Correlation_Module/correlation.o -std=c++14 -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=spatial_correlation_sampler_backend -D_GLIBCXX_USE_CXX11_ABI=0
#0 15.65 Traceback (most recent call last):
#0 15.65 File "/app/Pytorch-Correlation-extension/setup.py", line 69, in <module>
#0 15.65 launch_setup()
#0 15.65 File "/app/Pytorch-Correlation-extension/setup.py", line 37, in launch_setup
#0 15.65 setup(
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/__init__.py", line 87, in setup
#0 15.65 return distutils.core.setup(**attrs)
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
#0 15.65 return run_commands(dist)
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
#0 15.65 dist.run_commands()
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
#0 15.65 self.run_command(cmd)
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command
#0 15.65 super().run_command(command)
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
#0 15.65 cmd_obj.run()
#0 15.65 File "/opt/conda/lib/python3.10/site-packages/setuptools/command/install.py", line 74, in run
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
#0 15.66 _build_ext.build_extension(self, ext)
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 549, in build_extension
#0 15.66 objects = self.compiler.compile(
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/ccompiler.py", line 599, in compile
#0 15.66 self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 581, in unix_wrap_single_compile
#0 15.66 cflags = unix_cuda_flags(cflags)
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 548, in unix_cuda_flags
#0 15.66 cflags + _get_cuda_arch_flags(cflags))
#0 15.66 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1773, in _get_cuda_arch_flags
#0 15.66 arch_list[-1] += '+PTX'
#0 15.66 IndexError: list index out of range
------
Dockerfile:33
--------------------
31 | # Install Pytorch Correlation
32 | RUN git clone https://github.com/ClementPinard/Pytorch-Correlation-extension.git
33 | >>> RUN cd Pytorch-Correlation-extension && python setup.py install
34 | RUN cd -
35 |
--------------------
ERROR: failed to solve: process "/bin/sh -c cd Pytorch-Correlation-extension && python setup.py install" did not complete successfully: exit code: 1
Docker build failed with error: Command 'docker build -t sam-track:1.0.0 ..' returned non-zero exit status 1.
See this related issue : https://github.com/ClementPinard/Pytorch-Correlation-extension/issues/90
GPU is not available during docker build so you need to figure out your compute capbilities beforehand and set the TORCH_CUDA_ARCH_LIST environment variable accordingly
Hi @ClementPinard Thank you for your solution But I may need to deploy my docker image to different computer Is there any general solution to solve TORCH_CUDA_ARCH_LIST env var issue?
If you don't know what the gpu cuda capabilties of your machine will be, your best bet is to compile for as much architectures as possible, or wait for the docker to be launched to compile the library. Compiled code cannot be generic