text-embeddings-inference Error: Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80

System Info

While starting using docker as below I get error

docker run --gpus all -p 8912:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model

I can run cpu only image.

Error: Could not create backend

Caused by:
    Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80

nvidia-smi output is

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-PCIE-16GB           Off |   00000000:04:01.0 Off |                    0 |
| N/A   30C    P0             38W /  250W |    7695MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     63756      C   python                                       2564MiB |
|    0   N/A  N/A     63785      C   python                                       2564MiB |
|    0   N/A  N/A     63813      C   python                                       2564MiB |
+-----------------------------------------------------------------------------------------+

OS

Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy

Arch

x86_64

I also tried building image with setting cuda_compute_cap to 70

runtime_compute_cap=70
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap

Here I get error as cuda compute cap 70 is not supported

------
 > [builder 2/9] RUN if [ 70 -ge 75 -a 70 -lt 80 ];     then          nvprune --generate-code code=sm_70 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ 70 -ge 80 -a 70 -lt 90 ];     then          nvprune --generate-code code=sm_80 --generate-code code=sm_70 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ 70 -eq 90 ];     then          nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     else          echo "cuda compute cap 70 is not supported"; exit 1;     fi;:

Information

[X] Docker
[ ] The CLI directly

Tasks

[ ] An officially supported command
[ ] My own modifications

Reproduction

Use machine with above config
Run below command with using any supported model

docker run --gpus all -p 8912:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model

Expected behavior

Should able to run image on v100 gpu

May 16 '24 04:05 abratnap

From README I see that

GPUs with Cuda compute capabilities < 7.5 are not supported (V100, Titan V, GTX 1000 series, ...).

Is there any way I can still utilize the GPU with reduced performance or it's not at all possible to utilize it.

May 16 '24 04:05 abratnap

Do you have a solution? I'm a v100😫

Aug 16 '24 11:08 3252152

text-embeddings-inference text-embeddings-inference copied to clipboard

Error: Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80

System Info

Information

Tasks

Reproduction

Expected behavior

text-embeddings-inference
text-embeddings-inference copied to clipboard