text-embeddings-inference 5090 support soon ?

I have a 5090

1. docker run  ... ghcr.io/huggingface/text-embeddings-inference:hopper-1.7  ...

ERROR text_embeddings_backend: backends/src/lib.rs:388: Could not start Candle backend: Could not start backend: Runtime compute cap 120 is not compatible with compile time compute cap 90
Error: Could not create backend

Caused by:
    Could not start backend: Could not start a suitable backend


2. docker build

runtime_compute_cap=120

docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap

......

=> CANCELED [planner 7/7] RUN cargo chef prepare  --recipe-path recipe.json   0.6s
------
 > [builder 2/9] RUN --mount=type=secret,id=actions_results_url,env=ACTIONS_RESULTS_URL     --mount=type=secret,id=actions_runtime_token,env=ACTIONS_RUNTIME_TOKEN     if [ 120 -ge 75 -a 120 -lt 80 ];     then          nvprune --generate-code code=sm_120 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ 120 -ge 80 -a 120 -lt 90 ];     then          nvprune --generate-code code=sm_80 --generate-code code=sm_120 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ 120 -eq 90 ];     then          nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     else          echo "cuda compute cap 120 is not supported"; exit 1;     fi;:
0.329 cuda compute cap 120 is not supported
------
Dockerfile-cuda:50
--------------------
  49 |     
  50 | >>> RUN --mount=type=secret,id=actions_results_url,env=ACTIONS_RESULTS_URL \
  51 | >>>     --mount=type=secret,id=actions_runtime_token,env=ACTIONS_RUNTIME_TOKEN \
  52 | >>>     if [ ${CUDA_COMPUTE_CAP} -ge 75 -a ${CUDA_COMPUTE_CAP} -lt 80 ]; \
  53 | >>>     then  \
  54 | >>>         nvprune --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
  55 | >>>     elif [ ${CUDA_COMPUTE_CAP} -ge 80 -a ${CUDA_COMPUTE_CAP} -lt 90 ]; \
  56 | >>>     then  \
  57 | >>>         nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
  58 | >>>     elif [ ${CUDA_COMPUTE_CAP} -eq 90 ]; \
  59 | >>>     then  \
  60 | >>>         nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
  61 | >>>     else  \
  62 | >>>         echo "cuda compute cap ${CUDA_COMPUTE_CAP} is not supported"; exit 1; \
  63 | >>>     fi;
  64 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c if [ ${CUDA_COMPUTE_CAP} -ge 75 -a ${CUDA_COMPUTE_CAP} -lt 80 ];     then          nvprune --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ ${CUDA_COMPUTE_CAP} -ge 80 -a ${CUDA_COMPUTE_CAP} -lt 90 ];     then          nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ ${CUDA_COMPUTE_CAP} -eq 90 ];     then          nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     else          echo \"cuda compute cap ${CUDA_COMPUTE_CAP} is not supported\"; exit 1;     fi;" did not complete successfully: exit code: 1

Jun 16 '25 14:06 trillionmonster

soooo sad

Jun 17 '25 01:06 trillionmonster

Hey @trillionmonster I'm afraid we don't support Blackwell yet, but rather only up to Hopper, I'll ping you if we end up adding support for Blackwell anytime soon! 🤗

Aug 19 '25 09:08 alvarobartt

Hey @trillionmonster I'm afraid we don't support Blackwell yet, but rather only up to Hopper, I'll ping you if we end up adding support for Blackwell anytime soon! 🤗

still waiting

Sep 28 '25 03:09 trillionmonster

still waiting

If you have access to the hardware, would you like to implement support?

Oct 05 '25 01:10 sempervictus

Hey @trillionmonster @sempervictus thanks to @danielealbano support for Blackwell might come soon as per https://github.com/huggingface/text-embeddings-inference/pull/735 even if under preview / experimental at the moment, given that bumping CUDA is a requirement for Blackwell and that might impact Turing, Ampere and Hopper deployments that run on earlier CUDA versions.

Also @trillionmonster, please try to be a bit more patient and considerate in future issues as open-source maintainers often reply as soon as they can 🙏🏻

Oct 08 '25 14:10 alvarobartt