5090 support soon ?
I have a 5090
1. docker run ... ghcr.io/huggingface/text-embeddings-inference:hopper-1.7 ...
ERROR text_embeddings_backend: backends/src/lib.rs:388: Could not start Candle backend: Could not start backend: Runtime compute cap 120 is not compatible with compile time compute cap 90
Error: Could not create backend
Caused by:
Could not start backend: Could not start a suitable backend
2. docker build
runtime_compute_cap=120
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap
......
=> CANCELED [planner 7/7] RUN cargo chef prepare --recipe-path recipe.json 0.6s
------
> [builder 2/9] RUN --mount=type=secret,id=actions_results_url,env=ACTIONS_RESULTS_URL --mount=type=secret,id=actions_runtime_token,env=ACTIONS_RUNTIME_TOKEN if [ 120 -ge 75 -a 120 -lt 80 ]; then nvprune --generate-code code=sm_120 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; elif [ 120 -ge 80 -a 120 -lt 90 ]; then nvprune --generate-code code=sm_80 --generate-code code=sm_120 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; elif [ 120 -eq 90 ]; then nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; else echo "cuda compute cap 120 is not supported"; exit 1; fi;:
0.329 cuda compute cap 120 is not supported
------
Dockerfile-cuda:50
--------------------
49 |
50 | >>> RUN --mount=type=secret,id=actions_results_url,env=ACTIONS_RESULTS_URL \
51 | >>> --mount=type=secret,id=actions_runtime_token,env=ACTIONS_RUNTIME_TOKEN \
52 | >>> if [ ${CUDA_COMPUTE_CAP} -ge 75 -a ${CUDA_COMPUTE_CAP} -lt 80 ]; \
53 | >>> then \
54 | >>> nvprune --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
55 | >>> elif [ ${CUDA_COMPUTE_CAP} -ge 80 -a ${CUDA_COMPUTE_CAP} -lt 90 ]; \
56 | >>> then \
57 | >>> nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
58 | >>> elif [ ${CUDA_COMPUTE_CAP} -eq 90 ]; \
59 | >>> then \
60 | >>> nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
61 | >>> else \
62 | >>> echo "cuda compute cap ${CUDA_COMPUTE_CAP} is not supported"; exit 1; \
63 | >>> fi;
64 |
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ ${CUDA_COMPUTE_CAP} -ge 75 -a ${CUDA_COMPUTE_CAP} -lt 80 ]; then nvprune --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; elif [ ${CUDA_COMPUTE_CAP} -ge 80 -a ${CUDA_COMPUTE_CAP} -lt 90 ]; then nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; elif [ ${CUDA_COMPUTE_CAP} -eq 90 ]; then nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; else echo \"cuda compute cap ${CUDA_COMPUTE_CAP} is not supported\"; exit 1; fi;" did not complete successfully: exit code: 1
soooo sad
Hey @trillionmonster I'm afraid we don't support Blackwell yet, but rather only up to Hopper, I'll ping you if we end up adding support for Blackwell anytime soon! 🤗
Hey @trillionmonster I'm afraid we don't support Blackwell yet, but rather only up to Hopper, I'll ping you if we end up adding support for Blackwell anytime soon! 🤗
still waiting
still waiting
If you have access to the hardware, would you like to implement support?
Hey @trillionmonster @sempervictus thanks to @danielealbano support for Blackwell might come soon as per https://github.com/huggingface/text-embeddings-inference/pull/735 even if under preview / experimental at the moment, given that bumping CUDA is a requirement for Blackwell and that might impact Turing, Ampere and Hopper deployments that run on earlier CUDA versions.
Also @trillionmonster, please try to be a bit more patient and considerate in future issues as open-source maintainers often reply as soon as they can 🙏🏻