apex
apex copied to clipboard
ImportError: No module named 'fused_layer_norm_cuda'
CUDA: 9.0, V9.0.176 gcc: 5.4.0 torch:1.1.0 python:3.5 Ubuntu : 16.04
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
I use the command line to install apex, but get this issue
Cleaning up...
Removing source in /tmp/pip-req-build-l2_6ouft
Removed build tracker '/tmp/pip-req-tracker-3uw_mz6k'
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-l2_6ouft/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-6kq29sjc/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-l2_6ouft/
Exception information:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pip/_internal/cli/base_command.py", line 143, in main
status = self.run(options, args)
File "/usr/local/lib/python3.5/dist-packages/pip/_internal/commands/install.py", line 366, in run
use_user_site=options.use_user_site,
File "/usr/local/lib/python3.5/dist-packages/pip/_internal/req/__init__.py", line 49, in install_given_reqs
**kwargs
File "/usr/local/lib/python3.5/dist-packages/pip/_internal/req/req_install.py", line 791, in install
spinner=spinner,
File "/usr/local/lib/python3.5/dist-packages/pip/_internal/utils/misc.py", line 705, in call_subprocess
% (command_desc, proc.returncode, cwd))
pip._internal.exceptions.InstallationError: Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-req-build-l2_6ouft/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" --cpp_ext --cuda_ext install --record /tmp/pip-record-6kq29sjc/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-req-build-l2_6ouft/
1 location(s) to search for versions of pip:
* https://pypi.org/simple/pip/
Getting page https://pypi.org/simple/pip/
Starting new HTTPS connection (1): pypi.org:443
https://pypi.org:443 "GET /simple/pip/ HTTP/1.1" 200 11244
Analyzing links from page https://pypi.org/simple/pip/
Found link https://files.pythonhosted.org/packages/3d/9d/1e313763bdfb6a48977b65829c6ce2a43eaae29ea2f907c8bbef024a7219/pip-0.2.tar.gz#sha256=88bb8d029e1bf4acd0e04d300104b7440086f94cc1ce1c5c3c31e3293aee1f81 (from https://pypi.org/simple/pip/), version: 0.2
Found link https://files.pythonhosted.org/packages/18/ad/c0fe6cdfe1643a19ef027c7168572dac6283b80a384ddf21b75b921877da/pip-0.2.1.tar.gz#sha256=83522005c1266cc2de97e65072ff7554ac0f30ad369c3b02ff3a764b962048da (from https://pypi.org/simple/pip/), version: 0.2.1
Found link https://files.pythonhosted.org/packages/17/05/f66144ef69b436d07f8eeeb28b7f77137f80de4bf60349ec6f0f9509e801/pip-0.3.tar.gz#sha256=183c72455cb7f8860ac1376f8c4f14d7f545aeab8ee7c22cd4caf79f35a2ed47 (from https://pypi.org/simple/pip/), version: 0.3
Found link https://files.pythonhosted.org/packages/0a/bb/d087c9a1415f8726e683791c0b2943c53f2b76e69f527f2e2b2e9f9e7b5c/pip-0.3.1.tar.gz#sha256=34ce534f17065c78f980702928e988a6b6b2d8a9851aae5f1571a1feb9bb58d8 (from https://pypi.org/simple/pip/), version: 0.3.1
Found link https://files.pythonhosted.org/packages/cf/c3/153571aaac6cf999f4bb09c019b1ff379b7b599ea833813a41c784eec995/pip-0.4.tar.gz#sha256=28fc67558874f71fddda7168f73595f1650523dce3bc5bf189713ecdfc1e456e (from https://pypi.org/simple/pip/), version: 0.4
Found link https://files.pythonhosted.org/packages/8d/c7/f05c87812fa5d9562ecbc5f4f1fc1570444f53c81c834a7f662af406e3c1/pip-0.5.tar.gz#sha256=328d8412782f22568508a0d0c78a49c9920a82e44c8dfca49954fe525c152b2a (from https://pypi.org/simple/pip/), version: 0.5
Found link https://files.pythonhosted.org/packages/9a/aa/f536b6d14fe03343367da2ff44eee28f340ae650cd017ca088b6be13084a/pip-0.5.1.tar.gz#sha256=e27650538c41fe1007a41abd4cfd0f905b822622cbe1f8e7e09d1215af207694 (from https://pypi.org/simple/pip/), version: 0.5.1
Found link https://files.pythonhosted.org/packages/db/e6/fdf7be8a17b032c533d3f91e91e2c63dd81d3627cbe4113248a00c2d39d8/pip-0.6.tar.gz#sha256=4cf47db6815b2f435d1f44e1f35ff04823043f6161f7df9aec71a123b0c47f0d (from https://pypi.org/simple/pip/), version: 0.6
Found link https://files.pythonhosted.org/packages/91/cd/105f4d3c75d0ae18e12623acc96f42168aaba408dd6e43c4505aa21f8e37/pip-0.6.1.tar.gz#sha256=efe47e84ffeb0ea4804f9858b8a94bebd07f5452f907ebed36d03aed06a9f9ec (from https://pypi.org/simple/pip/), version: 0.6.1
Found link https://files.pythonhosted.org/packages/1c/c7/c0e1a9413c37828faf290f29
This is part logging.
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import apex
>>> import importlib
>>> importlib.import_module("fused_layer_norm_cuda")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 956, in _find_and_load_unlocked
ImportError: No module named 'fused_layer_norm_cuda'
>>>
I want to reinstall apex, so I uninstall it. However, I get the warning:
Skipping apex as it is not installed.
I have add cuda path in .bashrc file:
export LIBRARY_PATH=/usr/local/cuda/lib64${LIBRARY_PATH:+:${LIBRARY_PATH}}
# tell the OS where to find cuda shared objects at load time (runtime)
LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
####################################### I also install apex by this command line:
python3 setup.py install --cuda_ext --cpp_ext
But it gives me a issue:
building 'apex_C' extension
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/csrc
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.5/dist-packages/torch/include -I/usr/local/lib/python3.5/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.5/dist-packages/torch/include/TH -I/usr/local/lib/python3.5/dist-packages/torch/include/THC -I/usr/include/python3.5m -c csrc/flatten_unflatten.cpp -o build/temp.linux-x86_64-3.5/csrc/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=apex_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/csrc/flatten_unflatten.o -o build/lib.linux-x86_64-3.5/apex_C.cpython-35m-x86_64-linux-gnu.so
building 'amp_C' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/local/lib/python3.5/dist-packages/torch/include -I/usr/local/lib/python3.5/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.5/dist-packages/torch/include/TH -I/usr/local/lib/python3.5/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.5m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.5/csrc/amp_C_frontend.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
/usr/local/cuda/bin/nvcc -I/usr/local/lib/python3.5/dist-packages/torch/include -I/usr/local/lib/python3.5/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.5/dist-packages/torch/include/TH -I/usr/local/lib/python3.5/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.5m -c csrc/multi_tensor_scale_kernel.cu -o build/temp.linux-x86_64-3.5/csrc/multi_tensor_scale_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
In file included from csrc/multi_tensor_scale_kernel.cu:3:0:
/usr/local/lib/python3.5/dist-packages/torch/include/ATen/cuda/CUDAContext.h:12:22: fatal error: cusparse.h: No such file or directory
compilation terminated.
error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1
and I also try this suggestion:
export CUDA_HOME=/usr/local/cuda/bin/nvcc python3 setup.py install --cuda_ext --cpp_ext
If U have a good soluution, please tell me. Thanks very much.
Hi @qlwang25,
the first installation seems to have failed, but I cannot find the actual error message. Did you see any errors in the build output?
Regarding importing apex and not being able to uninstall it:
are you using your current working directory as apex
?
If so, could you change change the wdir and try to import apex again?
It should throw an error now instead of trying to import apex locally.
Note that the install commands contain the directory specifier at the end ./
, while it seems to be missing in the last two commands.
@ptrblck Hi, 3 years later im running into the same problem. Hopefully I can provide some more info on the debugging side.
Cuda: 10.2 gcc: 7.5.0 torch: 1.10.0 pytorch-lightning: 1.5.9 python: 3.7.6 OS: Ubuntu 18.04.6 LTS EDIT: This works on pytorch 1.10.2, the default installation for this cuda version.
torch and pytorch-lightning installed via the recommended approach on the torch website; however, I downgraded to 1.10.0 versus the default 1.10.2.
torch.__version__ = 1.10.0+cu102
setup.py:109: UserWarning: Option --pyprof not specified. Not installing PyProf dependencies!
warnings.warn("Option --pyprof not specified. Not installing PyProf dependencies!")
Compiling cuda extensions with
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
from /usr/local/cuda-10.2//bin
running install
/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/setuptools/command/install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/setuptools/command/easy_install.py:159: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
EasyInstallDeprecationWarning,
running bdist_egg
running egg_info
writing apex.egg-info/PKG-INFO
writing dependency_links to apex.egg-info/dependency_links.txt
writing top-level names to apex.egg-info/top_level.txt
/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/utils/cpp_extension.py:381: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'apex.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'apex.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'amp_C' extension
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include -I/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include/TH -I/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/.pyenv/versions/3.7.6/include/python3.7m -c csrc/amp_C_frontend.cpp -o build/temp.linux-x86_64-3.7/csrc/amp_C_frontend.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
/usr/local/cuda-10.2/bin/nvcc -I/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include -I/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include/TH -I/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-10.2/include -I/.pyenv/versions/3.7.6/include/python3.7m -c csrc/multi_tensor_sgd_kernel.cu -o build/temp.linux-x86_64-3.7/csrc/multi_tensor_sgd_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14
In file included from csrc/multi_tensor_sgd_kernel.cu:3:0:
/notebooks/persistent/andrew-sciotti/envs/venvs/test-apex/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory
#include <cusparse.h>
^~~~~~~~~~~~
compilation terminated.
error: command '/usr/local/cuda-10.2/bin/nvcc' failed with exit status 1
The issue is that cusparse.h
does not exist. I googled for a solution but unfortunately there are only links to ~2 pages that i found useful and I think both had solutions of just installing apex without CUDA.
I wasn't able to find that file anywhere on my machine, it is supposed to be in /usr/local/cuda/include
according to the internet but it isn't there for me.