flash-attention [QST] Got compilation error when compiling flash-attention-3 with CUDA 12.3

trafficstars

Hi, I got the following compilation error when compiling flash-attention-3 with CUDA 12.3. I read hopper/setup.py and find it will automatically update nvcc and ptxas. If flash-attention-3 still support CUDA 12.3?

/workspace/flash-attention/csrc/cutlass/include/cute/atom/copy_traits.hpp(130): error: static assertion failed with "Copy_Traits: dst failed to vectorize into registers. Layout is incompatible with this Cop
yOp."
    static_assert(decltype(size(rD) == Int<RegNumDst>{})::value, "Copy_Traits: dst failed to vectorize into registers. Layout is incompatible with this CopyOp.")
...
            instantiation of "void run_mha_fwd_<Arch,T,kHeadDim,kHeadDimV,Split,PagedKVNonTMA,Has_softcap,PackGQA>(Flash_fwd_params &, cudaStream_t) [with Arch=90, T=cutlass::float_e4m3_t, kHeadDim=192, kHeadDimV=
128, Split=false, PagedKVNonTMA=false, Has_softcap=false, PackGQA=true]" at line 8 of /workspace/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_e4m3_packgqa_sm90.cu

git commit: 27f501dbe011f4371bff938fe7e09311ab3002fa

Mar 20 '25 07:03 umiswing

Hello, is it solved?

Apr 10 '25 12:04 lingkong-q

I met the same issue when compiling the latest vllm?

3 errors detected in the compilation of "/home/logs/vllm/.deps/vllm-flash-attn-src/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_sm90.cu".
[294/314] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_softcap_sm90.cu.o
FAILED: vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_softcap_sm90.cu.o
static_assert(decltype(size(rD) == Int<RegNumDst>{})::value, "Copy_Traits: dst failed to vectorize into registers. Layout is incompatible with this CopyOp.")

Apr 21 '25 02:04 Huixxi

I met the same issue when compiling the latest vllm?

3 errors detected in the compilation of "/home/logs/vllm/.deps/vllm-flash-attn-src/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_sm90.cu".
[294/314] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_softcap_sm90.cu.o
FAILED: vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_softcap_sm90.cu.o
static_assert(decltype(size(rD) == Int<RegNumDst>{})::value, "Copy_Traits: dst failed to vectorize into registers. Layout is incompatible with this CopyOp.")

same issue in all e4m3 kernels on hopper. Still open and not fixed even in v0.8.5post1 of vllm. I hope it will be fixed soon.

May 05 '25 11:05 m-rds

Does it work with CUDA 12.4 and above?

May 06 '25 05:05 tridao

Does it work with CUDA 12.4 and above?

I can't say. My cluster at the moment has Driver Version 545.23.08, so can't update and tried only with CUDA 12.3. I issued a request but it can take a while. Do you believe the compilation error in vllm - FA3 e4m3 parts can be related with CUDA version in use?

Really thanks for helping.

May 06 '25 14:05 m-rds

flash-attention flash-attention copied to clipboard

[QST] Got compilation error when compiling flash-attention-3 with CUDA 12.3

flash-attention
flash-attention copied to clipboard