flash-attention
flash-attention copied to clipboard
RuntimeError: Error compiling objects for extension
I am installing flash-attn in image, container environment as follow:
Ubuntu 16.04.6
pytorch image: nvcr.io/nvidia/pytorch: 22.04-py3
PyTorch Version 1.12.0a0+bd13bc6
CUDA 11.6
My card is V100-32g
Command
pip install flash-attn --no-build-isolation
Part of Errors: 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 254 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb0ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb0ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params 104 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params 128 bytes stack frame, 104 bytes spill stores, 96 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 100 bytes spill stores, 92 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 240 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 70 bytes spill stores, 70 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params 112 bytes stack frame, 96 bytes spill stores, 88 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 244 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params 80 bytes stack frame, 58 bytes spill stores, 58 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 100 bytes spill stores, 92 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params 112 bytes stack frame, 148 bytes spill stores, 140 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 238 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 96 bytes spill stores, 88 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params 104 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 108 bytes spill stores, 100 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 244 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 208 bytes spill stores, 184 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params 160 bytes stack frame, 140 bytes spill stores, 116 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 140 bytes spill stores, 112 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params 88 bytes stack frame, 96 bytes spill stores, 124 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1789, in _run_ninja_build subprocess.run( File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py", line 201, in <module>
setup(
File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 68, in run
return orig.install.run(self)
File "/opt/conda/lib/python3.8/distutils/command/install.py", line 545, in run
self.run_command('build')
File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 763, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 584, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1468, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1805, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
----------------------------------------
ERROR: Command errored out with exit status 1: /opt/conda/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py'"'"'; file='"'"'/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-2e7eto0b/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.8/flash-attn Check the logs for full command output. 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 254 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb0ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb0ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params 104 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params 128 bytes stack frame, 104 bytes spill stores, 96 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 100 bytes spill stores, 92 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 240 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 70 bytes spill stores, 70 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb0ELb1ELb1EEv16Flash_fwd_params 112 bytes stack frame, 96 bytes spill stores, 88 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 244 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb0ELb1EEv16Flash_fwd_params 80 bytes stack frame, 58 bytes spill stores, 58 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 236 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb0ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 100 bytes spill stores, 92 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params 112 bytes stack frame, 148 bytes spill stores, 140 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 238 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 96 bytes spill stores, 88 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params 104 bytes stack frame, 140 bytes spill stores, 132 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 246 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi64ELi64ELi4ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi64ELi64ELi4ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params 120 bytes stack frame, 108 bytes spill stores, 100 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 244 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 208 bytes spill stores, 184 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb0ELb1ELb1EEv16Flash_fwd_params 160 bytes stack frame, 140 bytes spill stores, 116 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb0EEv16Flash_fwd_params 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 242 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb0ELb1EEv16Flash_fwd_params 96 bytes stack frame, 140 bytes spill stores, 112 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb0EEv16Flash_fwd_params 16 bytes stack frame, 16 bytes spill stores, 16 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ptxas info : Compiling entry function '_Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params' for 'sm_80' ptxas info : Function properties for _Z16flash_fwd_kernelI23Flash_fwd_kernel_traitsILi256ELi128ELi64ELi8ELb0ELb0EN7cutlass6half_tE19Flash_kernel_traitsILi256ELi128ELi64ELi8ES2_EELb1ELb1ELb1ELb1ELb1EEv16Flash_fwd_params 88 bytes stack frame, 96 bytes spill stores, 124 bytes spill loads ptxas info : Used 255 registers, 576 bytes cmem[0] ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1789, in _run_ninja_build subprocess.run( File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py", line 201, in <module>
setup(
File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 68, in run
return orig.install.run(self)
File "/opt/conda/lib/python3.8/distutils/command/install.py", line 545, in run
self.run_command('build')
File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 763, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 584, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1468, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1805, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
----------------------------------------
ERROR: Command errored out with exit status 1: /opt/conda/bin/python3.8 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py'"'"'; file='"'"'/tmp/pip-install-l8io2z6u/flash-attn_e9281ded0bb84ceca497ceb52704bbc6/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-2e7eto0b/install-record.txt --single-version-externally-managed --compile --install-headers /opt/conda/include/python3.8/flash-attn Check the logs for full command output.
I have a similar error when building for ubuntu 22.4, cuda 11.7, pytroch 2.0.1. If I find a solution and don't forget, I'll tell you.
Same... I'm on Ubuntu 22.04, CUDA 12.0, Torch 2.0.1+cu118, and I get this error:
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-yhdeng66/flash-attn_79d88b3729674f029fef6777f74603c3/setup.py", line 202, in <module>
setup(
File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 107, in setup
return distutils.core.setup(**attrs)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 343, in run
self.run_command("build")
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for flash-attn
Running setup.py clean for flash-attn
Failed to build flash-attn
ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
similar error here,
ptxas info : Function properties for _Z25flash_bwd_dot_do_o_kernelILb1E23Flash_bwd_kernel_traitsILi256ELi64ELi64ELi8ELi4ELi2ELi2ELb0ELb0EN7cutlass10bfloat16_tE19Flash_kernel_traitsILi256ELi64ELi64ELi8ES2_EEEv16Flash_bwd_params
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 696 bytes cmem[0]
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "setup.py", line 271, in <module>
setup(
File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 107, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 80, in run
self.do_egg_install()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install.py", line 129, in do_egg_install
self.run_command('bdist_egg')
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build
self.run_command('build_ext')
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/site-packages/setuptools/dist.py", line 1244, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/build_ext.py", line 127, in build_extension
super(build_ext, self).build_extension(ext)
File "/opt/conda/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
objects = self.compiler.compile(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
We have pre-built CUDA wheels now that setup.py will automatically download.
cuda-12.3
FAILED: /mnt/c/Users/luoru/Desktop/flash-attention/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim64_bf16_sm80.o /usr/local/cuda/bin/nvcc -I/mnt/c/Users/luoru/Desktop/flash-attention/csrc/flash_attn -I/mnt/c/Users/luoru/Desktop/flash-attention/csrc/flash_attn/src -I/mnt/c/Users/luoru/Desktop/flash-attention/csrc/cutlass/include -I/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/include -I/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/luoruofeng/miniconda3/include/python3.11 -c -c /mnt/c/Users/luoru/Desktop/flash-attention/csrc/flash_attn/src/flash_bwd_hdim64_bf16_sm80.cu -o /mnt/c/Users/luoru/Desktop/flash-attention/build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_bwd_hdim64_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 Killed ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/home/luoruofeng/miniconda3/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2100, in _run_ninja_build subprocess.run( File "/home/luoruofeng/miniconda3/lib/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/c/Users/luoru/Desktop/flash-attention/setup.py", line 285, in
same error. someone get it done? pls tell me
same error. someone get it done? pls tell me
Have you solved it? I have met the same problem……If you would share with it It will be appreciated
I'm getting the same error – I'd like to compile it myself, not use the pre-built CUDA wheels. Anyone know why this happens? I saw the message Killed
in the output, so tried it on a machine with 60 GiB of RAM, but it still failed.