flash-attention
flash-attention copied to clipboard
error: command '/opt/conda/bin/nvcc' failed with exit code 255
I had such issue. And the version of nvcc is 11.7, gcc 11.1
Building wheel for flash-attn (setup.py) ... error error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [115 lines of output]
torch.__version__ = 1.13.1+cu117 fatal: not a git repository (or any of the parent directories): .git running bdist_wheel /opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_py creating build creating build/lib.linux-x86_64-cpython-38 creating build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/attention_kernl.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_attn_triton_single_query.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_attn_triton_tmp.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_attn_triton_tmp_og.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_attn_triton_varlen.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-38/flash_attn copying flash_attn/rotary.py -> build/lib.linux-x86_64-cpython-38/flash_attn creating build/lib.linux-x86_64-cpython-38/flash_attn/layers copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/layers copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-38/flash_attn/layers copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-38/flash_attn/layers creating build/lib.linux-x86_64-cpython-38/flash_attn/losses copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/losses copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-38/flash_attn/losses copying flash_attn/losses/cross_entropy_apex.py -> build/lib.linux-x86_64-cpython-38/flash_attn/losses copying flash_attn/losses/cross_entropy_parallel.py -> build/lib.linux-x86_64-cpython-38/flash_attn/losses creating build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/gpt_j.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models creating build/lib.linux-x86_64-cpython-38/flash_attn/modules copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules creating build/lib.linux-x86_64-cpython-38/flash_attn/ops copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops copying flash_attn/ops/gelu_activation.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops creating build/lib.linux-x86_64-cpython-38/flash_attn/triton copying flash_attn/triton/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/triton copying flash_attn/triton/fused_attention.py -> build/lib.linux-x86_64-cpython-38/flash_attn/triton creating build/lib.linux-x86_64-cpython-38/flash_attn/utils copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils running build_ext building 'flash_attn_cuda' extension creating build/temp.linux-x86_64-cpython-38 creating build/temp.linux-x86_64-cpython-38/csrc creating build/temp.linux-x86_64-cpython-38/csrc/flash_attn creating build/temp.linux-x86_64-cpython-38/csrc/flash_attn/src gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/cutlass/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/opt/conda/include -I/opt/conda/include/python3.8 -c csrc/flash_attn/fmha_api.cpp -o build/temp.linux-x86_64-cpython-38/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ In file included from /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha.h:42, from csrc/flash_attn/fmha_api.cpp:33: /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha_utils.h: In function ‘void set_alpha(uint32_t&, float, Data_type)’: /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha_utils.h:63:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] 63 | alpha = reinterpret_cast<const uint32_t &>( h2 ); | ^~ /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha_utils.h:68:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] 68 | alpha = reinterpret_cast<const uint32_t &>( h2 ); | ^~ /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha_utils.h:70:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing] 70 | alpha = reinterpret_cast<const uint32_t &>( norm ); | ^~~~ csrc/flash_attn/fmha_api.cpp: In function ‘void set_params_fprop(FMHA_fprop_params&, size_t, size_t, size_t, size_t, size_t, at::Tensor, at::Tensor, at::Tensor, at::Tensor, void*, void*, void*, void*, void*, float, float, bool, int)’: csrc/flash_attn/fmha_api.cpp:64:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘struct FMHA_fprop_params’; use assignment or value-initialization instead [-Wclass-memaccess] 64 | memset(¶ms, 0, sizeof(params)); | ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from csrc/flash_attn/fmha_api.cpp:33: /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha.h:75:8: note: ‘struct FMHA_fprop_params’ declared here 75 | struct FMHA_fprop_params : public Qkv_params { | ^~~~~~~~~~~~~~~~~ csrc/flash_attn/fmha_api.cpp:60:15: warning: unused variable ‘acc_type’ [-Wunused-variable] 60 | Data_type acc_type = DATA_TYPE_FP32; | ^~~~~~~~ csrc/flash_attn/fmha_api.cpp: In function ‘std::vector<at::Tensor> mha_fwd(const at::Tensor&, const at::Tensor&, const at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, bool, int, c10::optional<at::Generator>)’: csrc/flash_attn/fmha_api.cpp:208:10: warning: unused variable ‘is_sm80’ [-Wunused-variable] 208 | bool is_sm80 = dprops->major == 8 && dprops->minor == 0; | ^~~~~~~ csrc/flash_attn/fmha_api.cpp: In function ‘std::vector<at::Tensor> mha_fwd_block(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, c10::optional<at::Generator>)’: csrc/flash_attn/fmha_api.cpp:533:10: warning: unused variable ‘is_sm80’ [-Wunused-variable] 533 | bool is_sm80 = dprops->major == 8 && dprops->minor == 0; | ^~~~~~~ /opt/conda/bin/nvcc -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/cutlass/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/opt/conda/include -I/opt/conda/include/python3.8 -c csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o build/temp.linux-x86_64-cpython-38/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 Command-line error: invalid option: --orig_src_path_name 1 catastrophic error detected in this compilation. Compilation terminated. error: command '/opt/conda/bin/nvcc' failed with exit code 255 [end of output]note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
Can you try with gcc 10?
I try with gcc 10 and get the same issue
What's your nvcc version?
What's your
nvccversion?
root@train-733924-worker-0:/usr/bin# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
All look reasonable, I've no idea why it fails. We recommend the Pytorch container from Nvidia, which has all the required tools to install FlashAttention.
Did you ever figure this out? I'm getting the same error
What's your
nvccversion?root@train-733924-worker-0:/usr/bin# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
you could run /opt/conda/bin/nvcc --version to check the real cuda code compiler.
What's your
nvccversion?root@train-733924-worker-0:/usr/bin# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
you could run
/opt/conda/bin/nvcc --versionto check the real cuda code compiler.
bash: /opt/conda/bin/nvcc: No such file or directory