text-embeddings-inference
text-embeddings-inference copied to clipboard
'ptxas' died due to signal 11 (Invalid memory reference)
System Info
Version: v.1.4.0 Cargo version: cargo 1.79.0 (ffa9cf99a 2024-06-03) GCC version: 11.4.1 GPU: Compile with CUDA_COMPUTE_CAP=86 on machine without GPU (but with CUDA 12.1). I plan to use this container with A40, but I don't have a GPU to build it.
Information
- [x] Docker
- [ ] The CLI directly
Tasks
- [x] An officially supported command
- [ ] My own modifications
Reproduction
I start this script:
export CUDA_COMPUTE_CAP=86
export CUDA_HOME=/usr/local/cuda-12.1
export PATH=${PATH}:/usr/local/cuda-12.1/bin
# Limit parallelism
export CARGO_BUILD_JOBS=1
export RAYON_NUM_THREADS=1
export CARGO_BUILD_INCREMENTAL=true
cd /usr/src/text-embeddings-inference || true
nvprune \
--generate-code code=sm_80 \
--generate-code code=sm_${CUDA_COMPUTE_CAP} \
/usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a
cargo chef cook --release \
--features candle-cuda \
--features static-linking \
--no-default-features \
--recipe-path recipe.json && \
sccache -s
I get this error:
[18:29:50] : [Step 1/2] [0m [91merror: failed to run custom build command for `candle-flash-attn v0.5.0 (https://github.com/OlivierDehaene/candle?rev=33b7ecf9ed82bb7c20f1a94555218fabfbaa2fe3#33b7ecf9)`
[18:29:50] : [Step 1/2]
[18:29:50] : [Step 1/2] Caused by:
[18:29:50] : [Step 1/2] [0m [91m process didn't exit successfully: `/usr/src/text-embeddings-inference/target/release/build/candle-flash-attn-67bc68aa050514c7/build-script-build` (exit status: 101)
[18:29:50] : [Step 1/2] --- stdout
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=build.rs
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_api.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim128_fp16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim160_fp16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim192_fp16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim224_fp16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim256_fp16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim32_fp16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim64_fp16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim96_fp16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim128_bf16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim160_bf16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim192_bf16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim224_bf16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim256_bf16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim32_bf16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim64_bf16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_hdim96_bf16_sm80.cu
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_kernel.h
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash_fwd_launch_template.h
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/flash.h
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/philox.cuh
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/softmax.h
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/utils.h
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/kernel_traits.h
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/block_info.h
[18:29:50] : [Step 1/2] cargo:rerun-if-changed=kernels/static_switch.h
[18:29:50] : [Step 1/2] cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
[18:29:50] : [Step 1/2] cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
[18:29:50] : [Step 1/2] cargo:rustc-env=CUDA_COMPUTE_CAP=86
[18:29:50] : [Step 1/2]
[18:29:50] : [Step 1/2] --- stderr
[....]
[18:29:50] : [Step 1/2] #$ CUDAFE_FLAGS=
[18:29:50] : [Step 1/2] #$ PTXAS_FLAGS=
[18:29:50] : [Step 1/2] #$ gcc -std=c++17 -D__CUDA_ARCH_LIST__=860 -E -x c++ -D__CUDACC__ -D__NVCC__ -D__CUDACC_EXTENDED_LAMBDA__ -D__CUDACC_RELAXED_CONSTEXPR__ -O3 -I"cutlass/include" "-I/usr/local/cuda-12.1/bin/../targets/x86_64-linux/include" -U "__CUDA_NO_HALF_OPERATORS__" -U "__CUDA_NO_HALF_CONVERSIONS__" -U "__CUDA_NO_HALF2_OPERATORS__" -U "__CUDA_NO_BFLOAT16_CONVERSIONS__" -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=1 -D__CUDACC_VER_BUILD__=105 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=1 -DCUDA_API_PER_THREAD_DEFAULT_STREAM=1 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "kernels/flash_fwd_hdim32_bf16_sm80.cu" -o "/tmp/tmpxft_000017c2_00000000-5_flash_fwd_hdim32_bf16_sm80.cpp4.ii"
[18:29:50] : [Step 1/2] #$ cudafe++ --c++17 --gnu_version=110401 --display_error_number --orig_src_file_name "kernels/flash_fwd_hdim32_bf16_sm80.cu" --orig_src_path_name "/root/.cargo/git/checkouts/candle-2c6db576e0f06e81/33b7ecf/candle-flash-attn/kernels/flash_fwd_hdim32_bf16_sm80.cu" --allow_managed --extended-lambda --relaxed_constexpr --m64 --parse_templates --gen_c_file_name "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.cpp" --stub_file_name "tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.stub.c" --gen_module_id_file --module_id_file_name "/tmp/tmpxft_000017c2_00000000-4_flash_fwd_hdim32_bf16_sm80.module_id" "/tmp/tmpxft_000017c2_00000000-5_flash_fwd_hdim32_bf16_sm80.cpp4.ii"
[18:29:50] : [Step 1/2] #$ gcc -std=c++17 -D__CUDA_ARCH__=860 -D__CUDA_ARCH_LIST__=860 -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ -D__CUDACC_EXTENDED_LAMBDA__ -D__CUDACC_RELAXED_CONSTEXPR__ -O3 -I"cutlass/include" "-I/usr/local/cuda-12.1/bin/../targets/x86_64-linux/include" -U "__CUDA_NO_HALF_OPERATORS__" -U "__CUDA_NO_HALF_CONVERSIONS__" -U "__CUDA_NO_HALF2_OPERATORS__" -U "__CUDA_NO_BFLOAT16_CONVERSIONS__" -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=1 -D__CUDACC_VER_BUILD__=105 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=1 -DCUDA_API_PER_THREAD_DEFAULT_STREAM=1 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "kernels/flash_fwd_hdim32_bf16_sm80.cu" -o "/tmp/tmpxft_000017c2_00000000-7_flash_fwd_hdim32_bf16_sm80.cpp1.ii"
[18:29:50] : [Step 1/2] #$ cicc --c++17 --gnu_version=110401 --display_error_number --orig_src_file_name "kernels/flash_fwd_hdim32_bf16_sm80.cu" --orig_src_path_name "/root/.cargo/git/checkouts/candle-2c6db576e0f06e81/33b7ecf/candle-flash-attn/kernels/flash_fwd_hdim32_bf16_sm80.cu" --allow_managed --extended-lambda --relaxed_constexpr -arch compute_86 -m64 --no-version-ident -ftz=1 -prec_div=0 -prec_sqrt=0 -fmad=1 -fast-math --gen_div_approx_ftz --include_file_name "tmpxft_000017c2_00000000-3_flash_fwd_hdim32_bf16_sm80.fatbin.c" -tused --module_id_file_name "/tmp/tmpxft_000017c2_00000000-4_flash_fwd_hdim32_bf16_sm80.module_id" --gen_c_file_name "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.c" --stub_file_name "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.cudafe1.gpu" "/tmp/tmpxft_000017c2_00000000-7_flash_fwd_hdim32_bf16_sm80.cpp1.ii" -o "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.ptx"
[18:29:50] : [Step 1/2] #$ ptxas -arch=sm_86 -m64 "/tmp/tmpxft_000017c2_00000000-6_flash_fwd_hdim32_bf16_sm80.ptx" -o "/tmp/tmpxft_000017c2_00000000-8_flash_fwd_hdim32_bf16_sm80.sm_86.cubin"
[18:29:50] : [Step 1/2] nvcc error : 'ptxas' died due to signal 11 (Invalid memory reference)
[18:29:50] : [Step 1/2] nvcc error : 'ptxas' core dumped
[18:29:50] : [Step 1/2] # --error 0x8b --
[18:29:50] : [Step 1/2] thread '<unnamed>' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.5/src/lib.rs:262:21:
[18:29:50] : [Step 1/2] nvcc error while executing compiling: "nvcc" "--gpu-architecture=sm_86" "-c" "-o" "/usr/src/text-embeddings-inference/target/release/build/candle-flash-attn-6656f6d321f9dddf/out/flash_fwd_hdim32_bf16_sm80-aca7d8fdce93ef53.o" "--default-stream" "per-thread" "-std=c++17" "-O3" "-U__CUDA_NO_HALF_OPERATORS__" "-U__CUDA_NO_HALF_CONVERSIONS__" "-U__CUDA_NO_HALF2_OPERATORS__" "-U__CUDA_NO_BFLOAT16_CONVERSIONS__" "-Icutlass/include" "--expt-relaxed-constexpr" "--expt-extended-lambda" "--use_fast_math" "--verbose" "kernels/flash_fwd_hdim32_bf16_sm80.cu"
[18:29:50] : [Step 1/2]
[18:29:50] : [Step 1/2] # stdout
[18:29:50] : [Step 1/2]
[18:29:50] : [Step 1/2]
[18:29:50] : [Step 1/2] # stderr
[18:29:50] : [Step 1/2]
[18:29:50] : [Step 1/2] note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[18:29:51] : [Step 1/2] [0m [91mthread 'main' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cargo-chef-0.1.67/src/recipe.rs:218:27:
[18:29:51] : [Step 1/2] Exited with status code: 101
[18:29:51] : [Step 1/2] note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[18:29:59]W: [Step 1/2] The command '/bin/sh -c docker/build' returned a non-zero code: 101
Expected behavior
TEI compiled.