ADOP
ADOP copied to clipboard
Can't complie install_pytorch on Manjaro
Hi. I'm trying to get it to work on my Manjaro with freshly installed conda.
And when I try to ./install_pytorch.sh it doesn't compile and falls with an error
FAILED: caffe2/CMakeFiles/torch_cuda.dir/utils/torch_cuda_generated_math_gpu.cu.o
cd /run/media/metya/B634BE3A34BDFE05/Projects/ADOP/External/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/utils && /home/metya/.conda/envs/adop/bin/cmake -E make_directory /run/media/metya/B634BE3A34BDFE05/Projects/ADOP/External/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/utils/. && /home/metya/.conda/envs/adop/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=Release -D generated_file:STRING=/run/media/metya/B634BE3A34BDFE05/Projects/ADOP/External/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/utils/./torch_cuda_generated_math_gpu.cu.o -D generated_cubin_file:STRING=/run/media/metya/B634BE3A34BDFE05/Projects/ADOP/External/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/utils/./torch_cuda_generated_math_gpu.cu.o.cubin.txt -P /run/media/metya/B634BE3A34BDFE05/Projects/ADOP/External/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/utils/torch_cuda_generated_math_gpu.cu.o.Release.cmake
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(149): warning: the "__visibility__" attribute can only appear on functions and variables with external linkage /home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(200): warning: the "__visibility__" attribute can only appear on functions and variables with external linkage
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(236): warning: the "__visibility__" attribute can only appear on functions and variables with external linkage
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(908): error: namespace "thrust" has no member "host_vector"
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(908): error: expected an expression
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(909): error: namespace "thrust" has no member "host_vector"
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(909): error: expected an expression
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(910): error: namespace "thrust" has no member "host_vector"
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(910): error: type name is not allowed
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(910): error: expected an expression
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(912): error: identifier "A_array" is undefined
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(913): error: identifier "B_array" is undefined
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(914): error: identifier "C_array" is undefined
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(917): error: identifier "A_array" is undefined
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(919): error: identifier "B_array" is undefined
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(920): error: identifier "C_array" is undefined
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(1763): warning: the "__visibility__" attribute can only appear on functions and variables with external linkage
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(2234): warning: the "__visibility__" attribute can only appear on functions and variables with external linkage
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(2282): warning: the "__visibility__" attribute can only appear on functions and variables with external linkage
/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu(2846): warning: the "__visibility__" attribute can only appear on functions and variables with external linkage
13 errors detected in the compilation of "/home/metya/ADOP/External/pytorch/caffe2/utils/math_gpu.cu".
CMake Error at torch_cuda_generated_math_gpu.cu.o.Release.cmake:281 (message):
Error generating file
/home/metya/ADOP/External/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/utils/./torch_cuda_generated_math_gpu.cu.o
[5182/6125] Building NVCC (Device) object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/torch_cuda_generated_UniqueCub.cu.o
ninja: build stopped: subcommand failed.
Then I follow the instructions from here https://github.com/darglein/ADOP/issues/6 like said @blurgyy, i.e. install cuda-11.1 from Aur and prepend it to CUDA_PATH in cuda.sh. And it didn't work again. Complitations of install_pytorch.sh falling with the same error all the times.
I have Manjaro on 15.4.10 kernel
My conda env installed by instruction:
# packages in environment at /home/metya/.conda/envs/adop:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
astunparse 1.6.3 py_0
blas 1.0 mkl
brotlipy 0.7.0 py39h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
c-ares 1.17.1 h27cfd23_0
ca-certificates 2021.10.26 h06a4308_2
certifi 2021.10.8 py39h06a4308_0
cffi 1.14.6 py39h400218f_0
charset-normalizer 2.0.4 pyhd3eb1b0_0
cmake 3.19.6 h973ab73_0
cryptography 35.0.0 py39hd23ed53_0
cudatoolkit 11.2.72 h2bc3f7f_0 nvidia
cudatoolkit-dev 11.2.2 py39h3811e60_0 conda-forge
cudnn 8.2.1.32 h86fa8c9_0 conda-forge
dataclasses 0.8 pyh6d0b6a4_7
expat 2.4.1 h2531618_2
freeimage 3.17.0 0 conda-forge
future 0.18.2 py39h06a4308_1
idna 3.2 pyhd3eb1b0_0
intel-openmp 2021.4.0 h06a4308_3561
jpeg 9d h36c2ea0_0 conda-forge
krb5 1.19.2 hac12032_0
ld_impl_linux-64 2.35.1 h7274673_9
libcurl 7.78.0 h0b77cf5_0
libedit 3.1.20210910 h7f8727e_0
libev 4.33 h7f8727e_1
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgomp 9.3.0 h5101ec6_17
libnghttp2 1.41.0 hf8bcb03_2
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.3.0 hd4cf53a_17
libuv 1.40.0 h7b6447c_0
lz4-c 1.9.3 h295c915_1
magma-cuda110 2.5.2 1 pytorch
mkl 2021.4.0 h06a4308_640
mkl-include 2021.4.0 h06a4308_640
mkl-service 2.4.0 py39h7f8727e_0
mkl_fft 1.3.1 py39hd3c417c_0
mkl_random 1.2.2 py39h51133e4_0
ncurses 6.3 h7f8727e_2
ninja 1.10.2 py39hd09550d_3
numpy 1.21.2 py39h20f2e39_0
numpy-base 1.21.2 py39h79a1101_0
openssl 1.1.1l h7f8727e_0
pip 21.2.4 py39h06a4308_0
pybind11 2.6.2 py39hff7bd54_1
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 21.0.0 pyhd3eb1b0_1
pysocks 1.7.1 py39h06a4308_0
python 3.9.7 h12debd9_1
python_abi 3.9 2_cp39 conda-forge
pyyaml 6.0 py39h7f8727e_1
readline 8.1 h27cfd23_0
requests 2.26.0 pyhd3eb1b0_0
rhash 1.4.1 h3c74f83_1
setuptools 58.0.4 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.36.0 hc218d9a_0
tk 8.6.11 h1ccaba5_0
typing_extensions 3.10.0.2 pyh06a4308_0
tzdata 2021e hda174b7_0
urllib3 1.26.7 pyhd3eb1b0_0
wheel 0.37.0 pyhd3eb1b0_1
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0
Ah few patches ago I have added the following line to prevent some problems if multiple cuda's where installed. Can you try without that line:
https://github.com/darglein/ADOP/blob/1339370f62ebdb80905fa448edbe3b5f818a6fb1/install_pytorch.sh#L29
I used last modifications of your repository. So there was this line of code.
So I don't know how to deal with that, especially I don't understand what compiler say.
Alright I installed a minimal Manjaro VM on my system and compile ADOP successfully using the following steps:
- Install nvidia drivers (don't need to install CUDA to the system)
- Install anaconda from the website (+check that it works in your shell)
- Install gcc9 with yay
- Update ADOP to the latest commit
-
./create_environment.sh
-
./install_pytorch.sh
-
./build_adop.sh
Hmm. I tried a several time to compile pytorch with different approaches, but it failed all the time with this error.
But ok! It seems I'm mess)
I'll try again with freshly installed new linux kernel and nvidia drivers and freshly installed conda env.
Thank you, I'll let know about my success or fail)
Can you post the cmake output of pytorch here?
So remove the directory ADOP/external/pytorch/build
and then run ./install_pytorch.sh
. Stop when it starts compiling the first source file and copy paste the cmake output above that.