flash-attention build fails on cuda 12.2 system

trafficstars

I am seeing following...

python3 setup.py develop 

torch.__version__  = 2.2.0a0+gitbbd5b93


running develop
running egg_info
writing flash_attn.egg-info/PKG-INFO
writing dependency_links to flash_attn.egg-info/dependency_links.txt
writing requirements to flash_attn.egg-info/requires.txt
writing top-level names to flash_attn.egg-info/top_level.txt
reading manifest file 'flash_attn.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.cu' under directory 'flash_attn'
warning: no files found matching '*.h' under directory 'flash_attn'
warning: no files found matching '*.cuh' under directory 'flash_attn'
warning: no files found matching '*.cpp' under directory 'flash_attn'
warning: no files found matching '*.hpp' under directory 'flash_attn'
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file 'flash_attn.egg-info/SOURCES.txt'
running build_ext
building 'flash_attn_2_cuda' extension
Emitting ninja build file /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/49] /usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/c
....
....

[8/49] /usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
FAILED: /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o
/usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 2102, in _run_ninja_build
    subprocess.run(
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/gg/git/flash-attention/setup.py", line 288, in <module>
    setup(
  File "/usr/lib/python3.9/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib64/python3.9/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib64/python3.9/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib64/python3.9/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/usr/lib/python3.9/site-packages/setuptools/command/develop.py", line 136, in install_for_development
    self.run_command('build_ext')
  File "/usr/lib64/python3.9/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib64/python3.9/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 873, in build_extensions
    build_ext.build_extensions(self)
  File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/usr/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 529, in build_extension
    objects = self.compiler.compile(sources,
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 2118, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[root@guyen-MS-7B22 git]#
[root@guyen-MS-7B22 git]# git remtoe -
git: 'remtoe' is not a git command. See 'git --help'.

The most similar command is
        remote
[root@guyen-MS-7B22 git]# git remote -
fatal: not a git repository (or any of the parent directories): .git
[root@guyen-MS-7B22 git]# git remote -v
fatal: not a git repository (or any of the parent directories): .git
[root@guyen-MS-7B22 git]# cd flash-attention/
[root@guyen-MS-7B22 flash-attention]# git remote -v
origin  https://github.com/Dao-AILab/flash-attention.git (fetch)
origin  https://github.com/Dao-AILab/flash-attention.git (push)
[root@guyen-MS-7B22 flash-attention]# cat /etc/os-release ; uname -r
NAME="CentOS Stream"
VERSION="9"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="9"
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream 9"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:centos:centos:9"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
5.19.0-38-generic

Nov 28 '23 22:11 jdgh000

We have prebuilt CUDA wheels that will be downloaded if you install with pip install flash-attn --no-build-isolation. Then you wouldn't have to compile things yourself.

Nov 28 '23 22:11 tridao

yeah, I saw it, however can you help building issues as my environment specifically demands building it manually... Any stable release branch where build is also reliable?

Nov 29 '23 02:11 ggghamd

Environments are so different it's hard to know, and I'm not an expert on compiling or building. There was no obvious error message pointing to a line in your log.

I use nvidia's Pytorch docker image which has all the libraries and compilers ready.

You can try limiting MAX_JOBS=4 as mentioned in the README in case it failed because of OOM.

Nov 29 '23 04:11 tridao

MAX_JOBS=4 failed with similar error, i dont believe it is OOM.

Nov 29 '23 19:11 jdgh000

Yeah then idk how to fix.

Nov 29 '23 19:11 tridao

hmm, is there a way you can fwd to someone who can? If someone here can help, where else can get help?

Dec 05 '23 08:12 jdgh000

I am getting the same error with H100 GPUs. I have tried all the different installation methods and right now, I am trying with a fresh conda environment. Still, I get this error(truncated)

      _run_ninja_build(
    File "/miniconda3/envs/pytorch_cuda/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2112, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

@tridao any idea?

Mar 14 '24 22:03 PKR-808

As of today, build starts ok but takes forever any idea??

/usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 /usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0

Apr 13 '24 17:04 jdgh000

flash-attention flash-attention copied to clipboard

build fails on cuda 12.2 system

flash-attention
flash-attention copied to clipboard