text-embeddings-inference
                                
                                 text-embeddings-inference copied to clipboard
                                
                                    text-embeddings-inference copied to clipboard
                            
                            
                            
                        Could not start backend: cannot find tensor embeddings.word_embeddings.weight
System Info
docker
docker run \
        -d \
        --name reranker \
        --gpus '"device=0"' \
        --env CUDA_VISIBLE_DEVICES=0 \
        -p 7863:80 \
        -v /data/ai/models:/data \
        ghcr.io/huggingface/text-embeddings-inference:86-1.5 \
        --model-id "/data/bge-reranker-base" \
        --dtype "float16" \
        --max-concurrent-requests 2048 \
        --max-batch-tokens 32768000 \
        --max-batch-requests 128 \
        --max-client-batch-size 4096 \
        --auto-truncate \
        --tokenization-workers 64 \
        --payload-limit 16000000
nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142                Driver Version: 550.142        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:5E:00.0 Off |                  N/A |
| 42%   22C    P8             17W /  350W |   24237MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Information
- [x] Docker
- [ ] The CLI directly
Tasks
- [x] An officially supported command
- [ ] My own modifications
Reproduction
docker run 
-d 
--name reranker 
--gpus '"device=0"' 
--env CUDA_VISIBLE_DEVICES=0 
-p 7863:80 
-v /data/ai/models:/data 
ghcr.io/huggingface/text-embeddings-inference:86-1.5 
--model-id "/data/bge-reranker-base" 
--dtype "float16" 
--max-concurrent-requests 2048 
--max-batch-tokens 32768000 
--max-batch-requests 128 
--max-client-batch-size 4096 
--auto-truncate 
--tokenization-workers 64 
--payload-limit 16000000
Expected behavior
It was still running normally before, until I encountered the context was too long, and then I couldn't successfully restart the model