flash-attention icon indicating copy to clipboard operation
flash-attention copied to clipboard

RuntimeError: Error compiling objects for extension

Open lllyyyqqq opened this issue 1 year ago • 8 comments

I am installing flash-attn in image, container environment as follow:

Ubuntu 16.04.6
pytorch image: nvcr.io/nvidia/pytorch: 22.04-py3
PyTorch Version 1.12.0a0+bd13bc6
CUDA 11.6
My card is V100-32g

Command pip install flash-attn --no-build-isolation

Part of Errors: 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 254 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb0ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb0ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params 104 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params 128 bytes stack frame, 104 bytes spill stores, 96 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 100 bytes spill stores, 92 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 240 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 70 bytes spill stores, 70 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params 112 bytes stack frame, 96 bytes spill stores, 88 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 244 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params 80 bytes stack frame, 58 bytes spill stores, 58 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 100 bytes spill stores, 92 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params 112 bytes stack frame, 148 bytes spill stores, 140 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 238 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 96 bytes spill stores, 88 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params 104 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 108 bytes spill stores, 100 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 244 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 208 bytes spill stores, 184 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params 160 bytes stack frame, 140 bytes spill stores, 116 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 140 bytes spill stores, 112 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params 88 bytes stack frame, 96 bytes spill stores, 124 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1789, in _run_ninja_build subprocess.run( File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py", line 201, in <module>
    setup(
  File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 68, in run
    return orig.install.run(self)
  File "/opt/conda/lib/python3.8/distutils/command/install.py", line 545, in run
    self.run_command('build')
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 763, in build_extensions
    build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
    objects = self.compiler.compile(sources,
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 584, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1468, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1805, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
----------------------------------------

ERROR: Command errored out with exit status 1: /opt/conda/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py'"'"'; file='"'"'/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-2e7eto0b/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.8/flash-attn Check the logs for full command output. 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 254 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb0ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb0ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params 104 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params 128 bytes stack frame, 104 bytes spill stores, 96 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 100 bytes spill stores, 92 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 240 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 70 bytes spill stores, 70 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params 112 bytes stack frame, 96 bytes spill stores, 88 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 244 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params 80 bytes stack frame, 58 bytes spill stores, 58 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 100 bytes spill stores, 92 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params 112 bytes stack frame, 148 bytes spill stores, 140 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 238 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 96 bytes spill stores, 88 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params 104 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 108 bytes spill stores, 100 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 244 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 208 bytes spill stores, 184 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params 160 bytes stack frame, 140 bytes spill stores, 116 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 140 bytes spill stores, 112 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params 88 bytes stack frame, 96 bytes spill stores, 124 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1789, in _run_ninja_build subprocess.run( File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py", line 201, in <module>
    setup(
  File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 68, in run
    return orig.install.run(self)
  File "/opt/conda/lib/python3.8/distutils/command/install.py", line 545, in run
    self.run_command('build')
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 763, in build_extensions
    build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
    objects = self.compiler.compile(sources,
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 584, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1468, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1805, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
----------------------------------------

ERROR: Command errored out with exit status 1: /opt/conda/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py'"'"'; file='"'"'/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-2e7eto0b/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.8/flash-attn Check the logs for full command output.

lllyyyqqq avatar Jul 19 '23 01:07 lllyyyqqq

I have a similar error when building for ubuntu 22.4, cuda 11.7, pytroch 2.0.1. If I find a solution and don't forget, I'll tell you.

SkibaSAY avatar Jul 20 '23 12:07 SkibaSAY

Same... I'm on Ubuntu 22.04, CUDA 12.0, Torch 2.0.1+cu118, and I get this error:

      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
          subprocess.run(
        File "/usr/lib/python3.10/subprocess.py", line 524, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-yhdeng66/flash-attn_79d88b3729674f029fef6777f74603c3/setup.py", line 202, in <module>
          setup(
        File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 343, in run
          self.run_command("build")
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 84, in run
          _build_ext.run(self)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
          build_ext.build_extensions(self)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 246, in build_extension
          _build_ext.build_extension(self, ext)
        File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
          objects = self.compiler.compile(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash-attn
  Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

chigkim avatar Aug 06 '23 07:08 chigkim

similar error here,

ptxas info    : Function properties for _Z25flash_bwd_dot_do_o_kernelILb1E23Flash_bwd_kernel_traitsILi256ELi64ELi64ELi8ELi4ELi2ELi2ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi256ELi64ELi64ELi8ES2_EEEv16Flash_bwd_params
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 80 registers, 696 bytes cmem[0]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
    subprocess.run(
  File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 271, in <module>
    setup(
  File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 107, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 80, in run
    self.do_egg_install()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 129, in do_egg_install
    self.run_command('bdist_egg')
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 164, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
    self.run_command(cmdname)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build
    self.run_command('build_ext')
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1244, in run_command
    super().run_command(command)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 84, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
    build_ext.build_extensions(self)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/build_ext.py", line 127, in build_extension
    super(build_ext, self).build_extension(ext)
  File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

bingxuchai avatar Sep 05 '23 08:09 bingxuchai

We have pre-built CUDA wheels now that setup.py will automatically download.

tridao avatar Sep 05 '23 18:09 tridao

cuda-12.3

FAILED: /mnt/c/Users/luoru/Desktop/flash-attention/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim64_bf16_sm80.o /usr/local/cuda/bin/nvcc -I/mnt/c/Users/luoru/Desktop/flash-attention/csrc/flash_attn -I/mnt/c/Users/luoru/Desktop/flash-attention/csrc/flash_attn/src -I/mnt/c/Users/luoru/Desktop/flash-attention/csrc/cutlass/include -I/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/include -I/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/luoruofeng/miniconda3/include/python3.11 -c -c /mnt/c/Users/luoru/Desktop/flash-attention/csrc/flash_attn/src/flash_bwd_hdim64_bf16_sm80.cu -o /mnt/c/Users/luoru/Desktop/flash-attention/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim64_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 Killed ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build subprocess.run( File "/home/luoruofeng/miniconda3/lib/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/c/Users/luoru/Desktop/flash-attention/setup.py", line 285, in setup( File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/init.py", line 103, in setup return distutils.core.setup(**attrs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) ^^^^^^^^^^^^^^^^^^ File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/command/install.py", line 84, in run self.do_egg_install() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/command/install.py", line 132, in do_egg_install self.run_command('bdist_egg') File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/command/bdist_egg.py", line 167, in run cmd = self.call_command('install_lib', warn_dir=0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/command/bdist_egg.py", line 153, in call_command self.run_command(cmdname) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/command/install_lib.py", line 11, in run self.build() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build self.run_command('build_ext') File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run self.build_extensions() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 873, in build_extensions build_ext.build_extensions(self) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions self._build_extensions_serial() File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial self.build_extension(ext) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 249, in build_extension _build_ext.build_extension(self, ext) File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension objects = self.compiler.compile( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension

luoruofeng avatar Jan 01 '24 11:01 luoruofeng

same error. someone get it done? pls tell me

userdsr avatar Feb 02 '24 01:02 userdsr

same error. someone get it done? pls tell me

Have you solved it? I have met the same problem……If you would share with it It will be appreciated

Lucyliuwen avatar Feb 29 '24 09:02 Lucyliuwen

I'm getting the same error – I'd like to compile it myself, not use the pre-built CUDA wheels. Anyone know why this happens? I saw the message Killed in the output, so tried it on a machine with 60 GiB of RAM, but it still failed.

Numeri avatar Apr 30 '24 13:04 Numeri