candle
candle copied to clipboard
panic without nvidia-smi
SYSTEM: jetson LINUX (Ubuntu 20.04.6 LTS) GitHub: https://github.com/rustai-solutions/candle_demo_yolov8
For some reason, jetson platform does not support nvidia-smi
Maybe panic when nvidia-smi cannot be found is a bad idea?
output:
Updating git repository `https://github.com/huggingface/candle.git`
Updating git submodule `https://github.com/NVIDIA/cutlass.git`
Compiling gemm-f16 v0.16.15
Compiling candle-kernels v0.3.3 (https://github.com/huggingface/candle.git?branch=main#135ae5f3)
Compiling clap_builder v4.4.9
Compiling gif v0.12.0
Compiling exr v1.6.4
Compiling indicatif v0.17.7
Compiling gemm v0.16.15
error: failed to run custom build command for `candle-kernels v0.3.3 (https://github.com/huggingface/candle.git?branch=main#135ae5f3)`
Caused by:
process didn't exit successfully: `/home/ubuntu/Documents/candle_demo_yolov8/target/debug/build/candle-kernels-d138de79043c255c/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:rustc-env=CUDA_INCLUDE_DIR=/usr/local/cuda/include
cargo:rerun-if-changed=src/
cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
--- stderr
thread 'main' panicked at /home/ubuntu/.cargo/git/checkouts/candle-0c2b4fa9e5801351/135ae5f/candle-kernels/build.rs:106:41:
Could not get Cuda compute cap: `nvidia-smi` failed. Ensure that you have CUDA installed and that `nvidia-smi` is in your PATH.
Caused by:
No such file or directory (os error 2)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:1981:18
|
1981 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.h[0]",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.h[0]
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:1987:18
|
1987 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.h[1]",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.h[1]
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:1993:18
|
1993 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.h[2]",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.h[2]
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:1999:18
|
1999 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.h[3]",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.h[3]
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:2005:18
|
2005 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.h[4]",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.h[4]
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:2011:18
|
2011 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.h[5]",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.h[5]
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:2017:18
|
2017 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.h[6]",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.h[6]
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:2023:18
|
2023 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.h[7]",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.h[7]
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:1953:18
|
1953 | "fadd {0:v}.8h, {1:v}.8h, {2:v}.8h",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fadd v0.8h, v1.8h, v2.8h
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:1965:18
|
1965 | "fmla {0:v}.8h, {1:v}.8h, {2:v}.8h",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmla v0.8h, v1.8h, v2.8h
| ^
error: instruction requires: fullfp16
--> /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/gemm-common-0.16.15/src/simd.rs:1939:18
|
1939 | "fmul {0:v}.8h, {1:v}.8h, {2:v}.8h",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fmul v0.8h, v1.8h, v2.8h
| ^
error: could not compile `gemm-f16` (lib) due to 11 previous errors
nvidia-smi is used as fallback to figure out cuda compute capability.
You can define environment variable CUDA_COMPUTE_CAP to skip this.
My issue is a bit different but related: I am building a Docker Image to run later in a different host.
Is there a way to build for all the different computes instead of a specific one as to have a universal image able to run in any supported Nvidia GPU?
I am facing the same issue here when building a docker image on non cuda to run on cuda device
RUN --mount=type=cache,target=/usr/local/cargo/registry \
--mount=type=cache,target=/root/workspace/target \
CUDA_COMPUTE_CAP=87 cargo build --features cuda,flash-attn,nccl --release --package candle_poc && \
cp target/release/candle_poc /opt/candle_poc/bin/
Notice CUDA_COMPUTE_CAP=87 cargo build
Hi! Would be very nice to add an error message explicitly suggesting CUDA_COMPUTE_CAP when nvidia-smi isn't found (which is common when building in a sandbox).
Using the opportunity, is it possible to specify a list of capabilities in CUDA_COMPUTE_CAP? Thanks!
P.S. A decent interface to follow might be that of https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html, which supports splayed cuda installations, fatbins, driver stubs, etc
CUDA_COMPUTE_CAP=all would be nice to have, for example to build generic container images that could run on any hardware.