flash-attention
flash-attention copied to clipboard
[QST] Got compilation error when compiling flash-attention-3 with CUDA 12.3
Hi, I got the following compilation error when compiling flash-attention-3 with CUDA 12.3. I read hopper/setup.py and find it will automatically update nvcc and ptxas. If flash-attention-3 still support CUDA 12.3?
/workspace/flash-attention/csrc/cutlass/include/cute/atom/copy_traits.hpp(130): error: static assertion failed with "Copy_Traits: dst failed to vectorize into registers. Layout is incompatible with this Cop
yOp."
static_assert(decltype(size(rD) == Int<RegNumDst>{})::value, "Copy_Traits: dst failed to vectorize into registers. Layout is incompatible with this CopyOp.")
...
instantiation of "void run_mha_fwd_<Arch,T,kHeadDim,kHeadDimV,Split,PagedKVNonTMA,Has_softcap,PackGQA>(Flash_fwd_params &, cudaStream_t) [with Arch=90, T=cutlass::float_e4m3_t, kHeadDim=192, kHeadDimV=
128, Split=false, PagedKVNonTMA=false, Has_softcap=false, PackGQA=true]" at line 8 of /workspace/flash-attention/hopper/instantiations/flash_fwd_hdim192_128_e4m3_packgqa_sm90.cu
git commit: 27f501dbe011f4371bff938fe7e09311ab3002fa
Hello, is it solved?
I met the same issue when compiling the latest vllm?
3 errors detected in the compilation of "/home/logs/vllm/.deps/vllm-flash-attn-src/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_sm90.cu".
[294/314] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_softcap_sm90.cu.o
FAILED: vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_softcap_sm90.cu.o
static_assert(decltype(size(rD) == Int<RegNumDst>{})::value, "Copy_Traits: dst failed to vectorize into registers. Layout is incompatible with this CopyOp.")
I met the same issue when compiling the latest vllm?
3 errors detected in the compilation of "/home/logs/vllm/.deps/vllm-flash-attn-src/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_sm90.cu". [294/314] Building CUDA object vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_softcap_sm90.cu.o FAILED: vllm-flash-attn/CMakeFiles/_vllm_fa3_C.dir/hopper/instantiations/flash_fwd_hdimall_e4m3_paged_softcap_sm90.cu.o static_assert(decltype(size(rD) == Int<RegNumDst>{})::value, "Copy_Traits: dst failed to vectorize into registers. Layout is incompatible with this CopyOp.")
same issue in all e4m3 kernels on hopper. Still open and not fixed even in v0.8.5post1 of vllm. I hope it will be fixed soon.
Does it work with CUDA 12.4 and above?
Does it work with CUDA 12.4 and above?
I can't say. My cluster at the moment has Driver Version 545.23.08, so can't update and tried only with CUDA 12.3. I issued a request but it can take a while. Do you believe the compilation error in vllm - FA3 e4m3 parts can be related with CUDA version in use?
Really thanks for helping.