flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

FA3 setup failed: flash_bwd_hdim128_bf16_sm90.o flash_fwd_hdim128_bf16_sm90.o ,... compilation terminated.

Open GMALP opened this issue 1 year ago • 5 comments

H800 CUDA12.3 FA3 test

cd hopper python setup.py install

FAILED: /test/flash-attention/hopper/build/temp.linux-x86_64-cpython-310/flash_fwd_hdim64_fp16_sm90.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /test/flash-attention/hopper/build/temp.linux-x86_64-cpython-310/flash_fwd_hdim64_fp16_sm90.o.d -I/test/flash-attention/csrc/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /test/flash-attention/hopper/flash_fwd_hdim64_fp16_sm90.cu -o /test/flash-attention/hopper/build/temp.linux-x86_64-cpython-310/flash_fwd_hdim64_fp16_sm90.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS_-U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v --ptxas-options=--verbose,--register-usage-level=10,--warn-on-local-memory-usage -lineinfo -DCUTLASS_DEBUG_TRACE_LEVEL=0 -DNDEBUG -gencode arch=compute_90a,code=sm_90a --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flashattn_hopper_cuda -D_GLIBCXX_USE_CXX11_ABI=1 In file included from /test/flash-attention/hopper/flash_fwd_hdim64_fp16_sm90.cu:4: /test/flash-attention/hopper/flash_fwd_launch_template.h:7:10: fatal error: cute/tensor.hpp: No such file or directory 7 | #include "cute/tensor.hpp" | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /test/flash-attention/hopper/flash_fwd_hdim64_fp16_sm90.cu:4: /test/flash-attention/hopper/flash_fwd_launch_template.h:7:10: fatal error: cute/tensor.hpp: No such file or directory 7 | #include "cute/tensor.hpp" | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_00000f4d_00000000-7_flash_fwd_hdim64_fp16_sm90.cpp1.ii 1

GMALP avatar Aug 07 '24 07:08 GMALP

Question:

  1. A100 isn't hooper Arch. Why would this happen?
  2. It seems there are problems with cutlass. Have you synced cutlass?

foreverlms avatar Aug 07 '24 14:08 foreverlms

Question:

  1. A100 isn't hooper Arch. Why would this happen? ok, will test on the H800. H800 same error...
  2. It seems there are problems with cutlass. Have you synced cutlass?
    cutlass 3.5.0 and same error again.

image

GMALP avatar Aug 08 '24 00:08 GMALP

Question:

  1. A100 isn't hooper Arch. Why would this happen? ok, will test on the H800. H800 same error...
  2. It seems there are problems with cutlass. Have you synced cutlass? cutlass 3.5.0 and same error again.

image

This is installed on your system? FA2 uses its own cutlass in csrc/cutlass. You should clone this submodule and have a try.

foreverlms avatar Aug 09 '24 02:08 foreverlms

Question:

  1. A100 isn't hooper Arch. Why would this happen? ok, will test on the H800. H800 same error...
  2. It seems there are problems with cutlass. Have you synced cutlass? cutlass 3.5.0 and same error again.

image

This is installed on your system? FA2 uses its own cutlass in csrc/cutlass. You should clone this submodule and have a try. ok, thanks very much.

test results are 2 failed, 1726 passed , is this result normal? image

GMALP avatar Aug 09 '24 07:08 GMALP

Yes, I'll relax those tests.

tridao avatar Aug 09 '24 16:08 tridao