flash-attention
flash-attention copied to clipboard
gfx1100 installation fails due to `fatal error: 'fmha_bwd.hpp' file not found`
As the title says, I'm unable to install the latest version through pip
pip install flash-attn --no-build-isolation
Collecting flash-attn
Using cached flash_attn-2.6.3.tar.gz (2.6 MB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in ./venv/lib/python3.11/site-packages (from flash-attn) (2.3.1+rocm6.0)
Requirement already satisfied: einops in ./venv/lib/python3.11/site-packages (from flash-attn) (0.8.0)
Requirement already satisfied: filelock in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (4.12.2)
Requirement already satisfied: sympy in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (1.12)
Requirement already satisfied: networkx in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (2.8.8)
Requirement already satisfied: jinja2 in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (3.1.4)
Requirement already satisfied: fsspec in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (2024.2.0)
Requirement already satisfied: pytorch-triton-rocm==2.3.1 in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (2.3.1)
Requirement already satisfied: MarkupSafe>=2.0 in ./venv/lib/python3.11/site-packages (from jinja2->torch->flash-attn) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in ./venv/lib/python3.11/site-packages (from sympy->torch->flash-attn) (1.3.0)
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [208 lines of output]
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
/home/zhenyapav/Projects/text-generation-webui/venv/bin/python3.11: can't open file '/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha/generate.py': [Errno 2] No such file or directory
/home/zhenyapav/Projects/text-generation-webui/venv/bin/python3.11: can't open file '/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha/generate.py': [Errno 2] No such file or directory
torch.__version__ = 2.3.1+rocm6.0
/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_common.hpp -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_common_hip.hpp [skipped, already hipified]
/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_api.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_api.hip [skipped, already hipified]
/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.hip [skipped, already hipified]
/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.hip [skipped, already hipified]
/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_varlen_bwd.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_varlen_bwd.hip [skipped, already hipified]
/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_varlen_fwd.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_varlen_fwd.hip [skipped, already hipified]
Successfully preprocessed all matching files.
Total number of unsupported CUDA function calls: 0
Total number of replaced kernel launches: 0
running bdist_wheel
Guessing wheel URL: https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+rocm60torch2.3cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
Precompiled wheel not found. Building from source...
running build
running build_py
creating build/lib.linux-x86_64-cpython-311
creating build/lib.linux-x86_64-cpython-311/flash_attn
copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn
copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-311/flash_attn
copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-311/flash_attn
copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-311/flash_attn
copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-311/flash_attn
copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-311/flash_attn
copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-311/flash_attn
copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-311/flash_attn
creating build/lib.linux-x86_64-cpython-311/hopper
copying hopper/__init__.py -> build/lib.linux-x86_64-cpython-311/hopper
copying hopper/benchmark_attn.py -> build/lib.linux-x86_64-cpython-311/hopper
copying hopper/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-311/hopper
copying hopper/setup.py -> build/lib.linux-x86_64-cpython-311/hopper
copying hopper/test_flash_attn.py -> build/lib.linux-x86_64-cpython-311/hopper
creating build/lib.linux-x86_64-cpython-311/flash_attn/layers
copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
creating build/lib.linux-x86_64-cpython-311/flash_attn/losses
copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/losses
copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-311/flash_attn/losses
creating build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/baichuan.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/bigcode.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/btlm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/falcon.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
creating build/lib.linux-x86_64-cpython-311/flash_attn/modules
copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
creating build/lib.linux-x86_64-cpython-311/flash_attn/ops
copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
creating build/lib.linux-x86_64-cpython-311/flash_attn/utils
copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
creating build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
copying flash_attn/ops/triton/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
copying flash_attn/ops/triton/cross_entropy.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
copying flash_attn/ops/triton/k_activations.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
copying flash_attn/ops/triton/layer_norm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
copying flash_attn/ops/triton/linear.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
copying flash_attn/ops/triton/mlp.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
copying flash_attn/ops/triton/rotary.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
running build_ext
building 'flash_attn_2_cuda' extension
creating /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311
creating /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc
creating /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck
Emitting ninja build file /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/build.ninja...
Compiling objects...
Using envvar MAX_JOBS (3) as the number of workers...
[1/5] /opt/rocm/bin/hipcc -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_bwd.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
FAILED: /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_bwd.o
/opt/rocm/bin/hipcc -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_bwd.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.hip:8:10: fatal error: 'fmha_bwd.hpp' file not found
#include "fmha_bwd.hpp"
^~~~~~~~~~~~~~
1 error generated when compiling for gfx1100.
[2/5] /opt/rocm/bin/hipcc -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_fwd.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
FAILED: /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_fwd.o
/opt/rocm/bin/hipcc -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_fwd.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.hip:8:10: fatal error: 'fmha_fwd.hpp' file not found
#include "fmha_fwd.hpp"
^~~~~~~~~~~~~~
1 error generated when compiling for gfx1100.
[3/5] /opt/rocm/bin/hipcc -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_api.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/flash_api.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/setup.py", line 450, in run
urllib.request.urlretrieve(wheel_url, wheel_filename)
File "/usr/lib/python3.11/urllib/request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/urllib/request.py", line 525, in open
response = meth(req, response)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/urllib/request.py", line 634, in http_response
response = self.parent.error(
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/urllib/request.py", line 563, in error
return self._call_chain(*args)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/urllib/request.py", line 496, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "/usr/lib/python3.11/urllib/request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '3']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/setup.py", line 490, in <module>
setup(
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/__init__.py", line 87, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
self.run_command(cmd)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/setup.py", line 467, in run
super().run()
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 368, in run
self.run_command("build")
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
self.distribution.run_command(command)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
self.distribution.run_command(command)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
super().run_command(command)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
cmd_obj.run()
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
self.build_extensions()
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
build_ext.build_extensions(self)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
self._build_extensions_serial()
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
self.build_extension(ext)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
super(build_ext, self).build_extension(ext)
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
objects = self.compiler.compile(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 683, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1783, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
fmha_bwd.hpp is part of an ROCm module, Composable Kernel. It might not have been installed by default.
fmha_bwd.hpp is part of an ROCm module, Composable Kernel. It might not have been installed by default.
I do have extra/composable-kernel 6.0.2-1 installed
+1
Tried installing it with rocm 6.2, (opencl-amd-dev package on Arch, which includes composable kernel files as well), same error.
this will not work on 7900 Navi31
I had the same problem. I was able to successfully work around it by cloning the git repo and installing from source:
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout 418d677192b483dfc1decfdf9aadca40b402485d # v2.6.3
BUILD_TARGET=rocm python setup.py install
Also, remember that if you have limited CPU RAM, you might need to set the MAX_JOBS environment variable for the last line
I had the same problem. I was able to successfully work around it by cloning the git repo and installing from source:
git clone https://github.com/Dao-AILab/flash-attention.git cd flash-attention git checkout 418d677192b483dfc1decfdf9aadca40b402485d # v2.6.3 BUILD_TARGET=rocm python setup.py installAlso, remember that if you have limited CPU RAM, you might need to set the MAX_JOBS environment variable for the last line
What ROCm version are you using? I'm getting compilation errors with 6.0
I had the same problem. I was able to successfully work around it by cloning the git repo and installing from source:
git clone https://github.com/Dao-AILab/flash-attention.git cd flash-attention git checkout 418d677192b483dfc1decfdf9aadca40b402485d # v2.6.3 BUILD_TARGET=rocm python setup.py installAlso, remember that if you have limited CPU RAM, you might need to set the MAX_JOBS environment variable for the last line
What ROCm version are you using? I'm getting compilation errors with 6.0
I think when I wrote the quoted commands, I was unknowingly using ROCm 5.7 which had been pre-installed system-wide... this issue might also be relevant
I was able to workaround the issue by copying the missing files over manually:
cp -pv /opt/rocm-6.3.1/include/ck_tile/ops/*.hpp ./flash-attention-2.7.0-cktile/csrc/flash_attn_ck/