flash-attention
flash-attention copied to clipboard
build fails on cuda 12.2 system
I am seeing following...
python3 setup.py develop
torch.__version__ = 2.2.0a0+gitbbd5b93
running develop
running egg_info
writing flash_attn.egg-info/PKG-INFO
writing dependency_links to flash_attn.egg-info/dependency_links.txt
writing requirements to flash_attn.egg-info/requires.txt
writing top-level names to flash_attn.egg-info/top_level.txt
reading manifest file 'flash_attn.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.cu' under directory 'flash_attn'
warning: no files found matching '*.h' under directory 'flash_attn'
warning: no files found matching '*.cuh' under directory 'flash_attn'
warning: no files found matching '*.cpp' under directory 'flash_attn'
warning: no files found matching '*.hpp' under directory 'flash_attn'
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file 'flash_attn.egg-info/SOURCES.txt'
running build_ext
building 'flash_attn_2_cuda' extension
Emitting ninja build file /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/49] /usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/c
....
....
[8/49] /usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
FAILED: /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o
/usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 2102, in _run_ninja_build
subprocess.run(
File "/usr/lib64/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/gg/git/flash-attention/setup.py", line 288, in <module>
setup(
File "/usr/lib/python3.9/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib64/python3.9/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib64/python3.9/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib64/python3.9/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/usr/lib/python3.9/site-packages/setuptools/command/develop.py", line 136, in install_for_development
self.run_command('build_ext')
File "/usr/lib64/python3.9/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib64/python3.9/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 873, in build_extensions
build_ext.build_extensions(self)
File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/usr/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
_build_ext.build_extension(self, ext)
File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 529, in build_extension
objects = self.compiler.compile(sources,
File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 2118, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[root@guyen-MS-7B22 git]#
[root@guyen-MS-7B22 git]# git remtoe -
git: 'remtoe' is not a git command. See 'git --help'.
The most similar command is
remote
[root@guyen-MS-7B22 git]# git remote -
fatal: not a git repository (or any of the parent directories): .git
[root@guyen-MS-7B22 git]# git remote -v
fatal: not a git repository (or any of the parent directories): .git
[root@guyen-MS-7B22 git]# cd flash-attention/
[root@guyen-MS-7B22 flash-attention]# git remote -v
origin https://github.com/Dao-AILab/flash-attention.git (fetch)
origin https://github.com/Dao-AILab/flash-attention.git (push)
[root@guyen-MS-7B22 flash-attention]# cat /etc/os-release ; uname -r
NAME="CentOS Stream"
VERSION="9"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="9"
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream 9"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:centos:centos:9"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
5.19.0-38-generic
We have prebuilt CUDA wheels that will be downloaded if you install with pip install flash-attn --no-build-isolation. Then you wouldn't have to compile things yourself.
yeah, I saw it, however can you help building issues as my environment specifically demands building it manually... Any stable release branch where build is also reliable?
Environments are so different it's hard to know, and I'm not an expert on compiling or building. There was no obvious error message pointing to a line in your log.
I use nvidia's Pytorch docker image which has all the libraries and compilers ready.
You can try limiting MAX_JOBS=4 as mentioned in the README in case it failed because of OOM.
MAX_JOBS=4 failed with similar error, i dont believe it is OOM.
Yeah then idk how to fix.
hmm, is there a way you can fwd to someone who can? If someone here can help, where else can get help?
I am getting the same error with H100 GPUs. I have tried all the different installation methods and right now, I am trying with a fresh conda environment. Still, I get this error(truncated)
_run_ninja_build(
File "/miniconda3/envs/pytorch_cuda/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2112, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
@tridao any idea?
As of today, build starts ok but takes forever any idea??
/usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 /usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0