flash-attention gfx1100 installation fails due to `fatal error: 'fmha

As the title says, I'm unable to install the latest version through pip

pip install flash-attn --no-build-isolation
Collecting flash-attn
  Using cached flash_attn-2.6.3.tar.gz (2.6 MB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in ./venv/lib/python3.11/site-packages (from flash-attn) (2.3.1+rocm6.0)
Requirement already satisfied: einops in ./venv/lib/python3.11/site-packages (from flash-attn) (0.8.0)
Requirement already satisfied: filelock in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (4.12.2)
Requirement already satisfied: sympy in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (1.12)
Requirement already satisfied: networkx in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (2.8.8)
Requirement already satisfied: jinja2 in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (3.1.4)
Requirement already satisfied: fsspec in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (2024.2.0)
Requirement already satisfied: pytorch-triton-rocm==2.3.1 in ./venv/lib/python3.11/site-packages (from torch->flash-attn) (2.3.1)
Requirement already satisfied: MarkupSafe>=2.0 in ./venv/lib/python3.11/site-packages (from jinja2->torch->flash-attn) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in ./venv/lib/python3.11/site-packages (from sympy->torch->flash-attn) (1.3.0)
Building wheels for collected packages: flash-attn
  Building wheel for flash-attn (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [208 lines of output]
      fatal: not a git repository (or any parent up to mount point /)
      Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
      /home/zhenyapav/Projects/text-generation-webui/venv/bin/python3.11: can't open file '/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha/generate.py': [Errno 2] No such file or directory
      /home/zhenyapav/Projects/text-generation-webui/venv/bin/python3.11: can't open file '/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha/generate.py': [Errno 2] No such file or directory
      
      
      torch.__version__  = 2.3.1+rocm6.0
      
      
      /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_common.hpp -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_common_hip.hpp [skipped, already hipified]
      /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_api.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_api.hip [skipped, already hipified]
      /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.hip [skipped, already hipified]
      /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.hip [skipped, already hipified]
      /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_varlen_bwd.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_varlen_bwd.hip [skipped, already hipified]
      /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_varlen_fwd.cu -> /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_varlen_fwd.hip [skipped, already hipified]
      Successfully preprocessed all matching files.
      Total number of unsupported CUDA function calls: 0
      
      
      Total number of replaced kernel launches: 0
      running bdist_wheel
      Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.6.3/flash_attn-2.6.3+rocm60torch2.3cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
      Precompiled wheel not found. Building from source...
      running build
      running build_py
      creating build/lib.linux-x86_64-cpython-311
      creating build/lib.linux-x86_64-cpython-311/flash_attn
      copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn
      copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-311/flash_attn
      copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-311/flash_attn
      copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-311/flash_attn
      copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-311/flash_attn
      copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-311/flash_attn
      copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-311/flash_attn
      copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-311/flash_attn
      creating build/lib.linux-x86_64-cpython-311/hopper
      copying hopper/__init__.py -> build/lib.linux-x86_64-cpython-311/hopper
      copying hopper/benchmark_attn.py -> build/lib.linux-x86_64-cpython-311/hopper
      copying hopper/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-311/hopper
      copying hopper/setup.py -> build/lib.linux-x86_64-cpython-311/hopper
      copying hopper/test_flash_attn.py -> build/lib.linux-x86_64-cpython-311/hopper
      creating build/lib.linux-x86_64-cpython-311/flash_attn/layers
      copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
      copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
      copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-311/flash_attn/layers
      creating build/lib.linux-x86_64-cpython-311/flash_attn/losses
      copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/losses
      copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-311/flash_attn/losses
      creating build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/baichuan.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/bigcode.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/btlm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/falcon.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-311/flash_attn/models
      creating build/lib.linux-x86_64-cpython-311/flash_attn/modules
      copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
      copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
      copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
      copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
      copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-311/flash_attn/modules
      creating build/lib.linux-x86_64-cpython-311/flash_attn/ops
      copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
      copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
      copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
      copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
      copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops
      creating build/lib.linux-x86_64-cpython-311/flash_attn/utils
      copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
      copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
      copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
      copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
      copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-311/flash_attn/utils
      creating build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
      copying flash_attn/ops/triton/__init__.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
      copying flash_attn/ops/triton/cross_entropy.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
      copying flash_attn/ops/triton/k_activations.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
      copying flash_attn/ops/triton/layer_norm.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
      copying flash_attn/ops/triton/linear.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
      copying flash_attn/ops/triton/mlp.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
      copying flash_attn/ops/triton/rotary.py -> build/lib.linux-x86_64-cpython-311/flash_attn/ops/triton
      running build_ext
      building 'flash_attn_2_cuda' extension
      creating /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311
      creating /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc
      creating /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck
      Emitting ninja build file /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/build.ninja...
      Compiling objects...
      Using envvar MAX_JOBS (3) as the number of workers...
      [1/5] /opt/rocm/bin/hipcc  -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_bwd.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
      FAILED: /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_bwd.o
      /opt/rocm/bin/hipcc  -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_bwd.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
      /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_bwd.hip:8:10: fatal error: 'fmha_bwd.hpp' file not found
      #include "fmha_bwd.hpp"
               ^~~~~~~~~~~~~~
      1 error generated when compiling for gfx1100.
      [2/5] /opt/rocm/bin/hipcc  -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_fwd.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
      FAILED: /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_fwd.o
      /opt/rocm/bin/hipcc  -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/mha_fwd.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
      /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/mha_fwd.hip:8:10: fatal error: 'fmha_fwd.hpp' file not found
      #include "fmha_fwd.hpp"
               ^~~~~~~~~~~~~~
      1 error generated when compiling for gfx1100.
      [3/5] /opt/rocm/bin/hipcc  -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/library/include -I/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/composable_kernel/example/ck_tile/01_fmha -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/TH -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THC -I/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm/include -I/home/zhenyapav/Projects/text-generation-webui/venv/include -I/usr/include/python3.11 -c -c /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/csrc/flash_attn_ck/flash_api.hip -o /tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/build/temp.linux-x86_64-cpython-311/csrc/flash_attn_ck/flash_api.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DCUDA_HAS_FP16=1 -D__HIP_NO_HALF_OPERATORS__=1 -D__HIP_NO_HALF_CONVERSIONS__=1 -O3 -std=c++17 -mllvm -enable-post-misched=0 -DCK_TILE_FMHA_FWD_FAST_EXP2=1 -fgpu-flush-denormals-to-zero -DCK_ENABLE_BF16 -DCK_ENABLE_BF8 -DCK_ENABLE_FP16 -DCK_ENABLE_FP32 -DCK_ENABLE_FP64 -DCK_ENABLE_FP8 -DCK_ENABLE_INT8 -DCK_USE_XDL -DUSE_PROF_API=1 -D__HIP_PLATFORM_HCC__=1 --offload-arch=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -fno-gpu-rdc
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/setup.py", line 450, in run
          urllib.request.urlretrieve(wheel_url, wheel_filename)
        File "/usr/lib/python3.11/urllib/request.py", line 241, in urlretrieve
          with contextlib.closing(urlopen(url, data)) as fp:
                                  ^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/urllib/request.py", line 216, in urlopen
          return opener.open(url, data, timeout)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/urllib/request.py", line 525, in open
          response = meth(req, response)
                     ^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/urllib/request.py", line 634, in http_response
          response = self.parent.error(
                     ^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/urllib/request.py", line 563, in error
          return self._call_chain(*args)
                 ^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.11/urllib/request.py", line 496, in _call_chain
          result = func(*args)
                   ^^^^^^^^^^^
        File "/usr/lib/python3.11/urllib/request.py", line 643, in http_error_default
          raise HTTPError(req.full_url, code, msg, hdrs, fp)
      urllib.error.HTTPError: HTTP Error 404: Not Found
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2107, in _run_ninja_build
          subprocess.run(
        File "/usr/lib/python3.11/subprocess.py", line 571, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '3']' returned non-zero exit status 1.
      
      The above exception was the direct cause of the following exception:
      
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/setup.py", line 490, in <module>
          setup(
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/__init__.py", line 87, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
          self.run_command(cmd)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-dpvuc733/flash-attn_dffe3d300d2a42d7938a89c8e6a69c2d/setup.py", line 467, in run
          super().run()
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
          self.distribution.run_command(command)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/dist.py", line 1217, in run_command
          super().run_command(command)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
          cmd_obj.run()
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
          self.build_extensions()
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 870, in build_extensions
          build_ext.build_extensions(self)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
          self._build_extensions_serial()
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
          self.build_extension(ext)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
          super(build_ext, self).build_extension(ext)
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 547, in build_extension
          objects = self.compiler.compile(
                    ^^^^^^^^^^^^^^^^^^^^^^
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 683, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1783, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/home/zhenyapav/Projects/text-generation-webui/venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2123, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash-attn
  Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Jul 29 '24 14:07 ZhenyaPav

fmha_bwd.hpp is part of an ROCm module, Composable Kernel. It might not have been installed by default.

Jul 30 '24 10:07 selphea

fmha_bwd.hpp is part of an ROCm module, Composable Kernel. It might not have been installed by default.

I do have extra/composable-kernel 6.0.2-1 installed

Jul 31 '24 10:07 ZhenyaPav

+1

Aug 20 '24 12:08 unclemusclez

Tried installing it with rocm 6.2, (opencl-amd-dev package on Arch, which includes composable kernel files as well), same error.

Aug 23 '24 22:08 ZhenyaPav

this will not work on 7900 Navi31

Aug 23 '24 22:08 unclemusclez

I had the same problem. I was able to successfully work around it by cloning the git repo and installing from source:

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout 418d677192b483dfc1decfdf9aadca40b402485d  # v2.6.3
BUILD_TARGET=rocm python setup.py install

Also, remember that if you have limited CPU RAM, you might need to set the MAX_JOBS environment variable for the last line

Oct 25 '24 14:10 calebthomas259

I had the same problem. I was able to successfully work around it by cloning the git repo and installing from source:
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout 418d677192b483dfc1decfdf9aadca40b402485d  # v2.6.3
BUILD_TARGET=rocm python setup.py install
Also, remember that if you have limited CPU RAM, you might need to set the MAX_JOBS environment variable for the last line

What ROCm version are you using? I'm getting compilation errors with 6.0

Nov 09 '24 20:11 ZhenyaPav

I had the same problem. I was able to successfully work around it by cloning the git repo and installing from source:
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout 418d677192b483dfc1decfdf9aadca40b402485d  # v2.6.3
BUILD_TARGET=rocm python setup.py install
Also, remember that if you have limited CPU RAM, you might need to set the MAX_JOBS environment variable for the last line
What ROCm version are you using? I'm getting compilation errors with 6.0

I think when I wrote the quoted commands, I was unknowingly using ROCm 5.7 which had been pre-installed system-wide... this issue might also be relevant

Nov 10 '24 09:11 calebthomas259

I was able to workaround the issue by copying the missing files over manually:

cp -pv /opt/rocm-6.3.1/include/ck_tile/ops/*.hpp ./flash-attention-2.7.0-cktile/csrc/flash_attn_ck/

Jan 04 '25 06:01 rtlinux

flash-attention
flash-attention copied to clipboard

gfx1100 installation fails due to `fatal error: 'fmha_bwd.hpp' file not found`

flash-attention flash-attention copied to clipboard

gfx1100 installation fails due to `fatal error: 'fmha_bwd.hpp' file not found`

flash-attention
flash-attention copied to clipboard