apex
apex copied to clipboard
Apex installation failed
I was trying to install apex through dockerfile (python3.6 cuda11.1) via the following commands
RUN git clone https://github.com/NVIDIA/apex && \
cd apex && \
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
and I got the following errors, it was able to build 2 days ago, but it fails now and the failure seems to be related to fused_dense_cuda.cu
[0m[91m csrc/fused_dense_cuda.cu(415): error: identifier "CUBLASLT_EPILOGUE_GELU_AUX" is undefined
csrc/fused_dense_cuda.cu(427): error: identifier "CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_POINTER" is undefined
[0m[91m csrc/fused_dense_cuda.cu(428): error: identifier "CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_LD" is undefined
csrc/fused_dense_cuda.cu(435): error: identifier "CUBLASLT_EPILOGUE_GELU_AUX_BIAS" is undefined
[0m[91m csrc/fused_dense_cuda.cu(555): error: identifier "CUBLASLT_EPILOGUE_GELU_AUX" is undefined
csrc/fused_dense_cuda.cu(567): error: identifier "CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_POINTER" is undefined
[0m[91m
csrc/fused_dense_cuda.cu(568): error: identifier "CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_LD" is undefined
[0m[91m csrc/fused_dense_cuda.cu(575): error: identifier "CUBLASLT_EPILOGUE_GELU_AUX_BIAS" is undefined
[0m[91m
[0m[91m csrc/fused_dense_cuda.cu(687): error: identifier "CUBLASLT_EPILOGUE_BGRADB" is undefined
[0m[91m csrc/fused_dense_cuda.cu(826): error: identifier "CUBLASLT_EPILOGUE_BGRADB" is undefined
[0m[91m csrc/fused_dense_cuda.cu(920): error: identifier "CUBLASLT_EPILOGUE_DGELU_BGRAD" is undefined
[0m[91m
[0m[91m csrc/fused_dense_cuda.cu(936): error: identifier "CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_POINTER" is undefined
[0m[91m
[0m[91m csrc/fused_dense_cuda.cu(940): error: identifier "CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_LD" is undefined
[0m[91m csrc/fused_dense_cuda.cu(1055): error: identifier "CUBLASLT_EPILOGUE_DGELU_BGRAD" is undefined
[0m[91m csrc/fused_dense_cuda.cu(1071): error: identifier "CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_POINTER" is undefined
[0m[91m
[0m[91m csrc/fused_dense_cuda.cu(1075): error: identifier "CUBLASLT_MATMUL_DESC_EPILOGUE_AUX_LD" is undefined
[0m[91m csrc/fused_dense_cuda.cu(1203): warning: variable "beta_one" was declared but never referenced
[0m[91m csrc/fused_dense_cuda.cu(1332): warning: variable "beta_one" was declared but never referenced
[0m[91m
[0m[91m 16 errors detected in the compilation of "csrc/fused_dense_cuda.cu".
[0m[91m error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1
I got the same error when I compile it with python setup.py --cuda_ext --cpp_ext build
in ArchLinux, cuda 11.4. check the full log: python-apex-git.log.txt
Same error, pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Ubuntu 20.04, Cuda 11.1
P.S. Old Apex june 30 version 0.1 - OK.
I have same problem. It comes from the difference of CUBLAS version. CUBLASLT_EPILOGUE_GELU_AUX is from CUDA11.4 but isn't in CUDA11.3. Is there anybody who know to go back to the old version of apex using git?
my old apex ok, it's behind 23 commit . git reset --hard 0c2c6eea
Same error, pip3 install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Ubuntu 20.04, Cuda 11.1
P.S. Old Apex june 30 version 0.1 - OK.
Useful!
Thanks:)
git checkout 0c2c6eea6556b208d1a8711197efc94899e754e1(17th July) is OK too. Because I found the first version of apex that contain the GeLU function in git log. I succeeded to install it too.
But, I recommend to install the version of CUDA11.4.
@seryilmaz it seems your recent change needs a guard for older cublas versions e.g. in https://github.com/NVIDIA/apex/blob/ae1cdd64314e598b935a8138b3532d4b652a8f12/csrc/fused_dense_cuda.cu#L687
I've merged https://github.com/NVIDIA/apex/pull/1162. Could you pull the latest master and retry the build again, please?
@ptrblck I can now compile and build python-apex-git in ArchLinux. Thanks.
I had the same error even after changing the setup.py
file. It was successfully installed after changing to CUDA 11.3. This CUDA/CUDNN installation script was very helpful. Here!
my old apex ok, it's behind 23 commit . git reset --hard 0c2c6ee
It installs after this version of Apex is pulled.