text-embeddings-inference
text-embeddings-inference copied to clipboard
Download of BAAI/bge-m3 fails on 1.5 using ONNX
System Info
- text-embeddings-inference version: 1.5
- OS: Windows/Debian 11
- Deployment: Docker
- Model: BAAI/bge-m3
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
Configuring TEI 1.5-cpu to run BAAI/bge-m3 in Docker (or Docker Compose) results in model not downloaded even if in Hugging Face model files are downloadable and onnx folder is present.
To replicate run
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id BAAI/bge-m3
or
services:
embeddings:
image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
command: --model-id BAAI/bge-m3
ports:
- "8080:80"
The resulting output is the following
2024-10-01T12:02:10.892818Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/**e-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "1e402b3ef386", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-10-01T12:02:10.893014Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-10-01T12:02:10.959512Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-10-01T12:02:12.155636Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-10-01T12:02:12.418430Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-10-01T12:02:12.418475Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-10-01T12:02:12.689863Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-10-01T12:02:15.212593Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-10-01T12:02:15.337129Z WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-m3/resolve/main/model.onnx)
2024-10-01T12:02:15.337216Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
2024-10-01T12:02:15.782935Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 3.364505011s
2024-10-01T12:02:16.281335Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 8192
2024-10-01T12:02:16.286095Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 4 tokenization workers
2024-10-01T12:02:17.421733Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend
Caused by:
Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor 0.auto_model.encoder.layer.16.attention.output.LayerNorm.weight failed.GetFileLength for /data/models--BAAI--bge-m3/snapshots/5617a9f61b028005a4858fdac845db406aefb181/onnx/model.onnx_data failed:Invalid fd was supplied: -1
Checking the downloaded files I see the following blobs
-rw-r--r-- 1 root root 54 Oct 1 12:55 0140ba1eac83a3c9b857d64baba91969d988624b
-rw-r--r-- 1 root root 123 Oct 1 12:55 1fba91c78a6c8e17227058ab6d4d3acb5d8630a9
-rw-r--r-- 1 root root 17M Oct 1 12:55 21106b6d7dab2952c1d496fb21d5dc9db75c28ed361a05f5020bbba27810dd08
-rw-r--r-- 1 root root 191 Oct 1 12:55 9bd85925f325e25246d94c4918dc02ab98f2a1b7
-rw-r--r-- 1 root root 687 Oct 1 12:55 e6eda1c72da8f9dc30fdd9b69c73d35af3b7a7ad
-rw-r--r-- 1 root root 708K Oct 1 12:55 f84251230831afb359ab26d9fd37d5936d4d9bb5d1d5410e66442f630f24435b
Somehow in the onnx folder is also present a file named model.onnx_data that is probably missing from the download.
Expected behavior
The model is fully downloaded. If not a clear error should state that there was some kind of HTTP error. Maybe showing the expected download size and what has been downloaded so far will help.