text-embeddings-inference icon indicating copy to clipboard operation
text-embeddings-inference copied to clipboard

Download of BAAI/bge-m3 fails on 1.5 using ONNX

Open avvertix opened this issue 1 year ago • 6 comments

System Info

  • text-embeddings-inference version: 1.5
  • OS: Windows/Debian 11
  • Deployment: Docker
  • Model: BAAI/bge-m3

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

Configuring TEI 1.5-cpu to run BAAI/bge-m3 in Docker (or Docker Compose) results in model not downloaded even if in Hugging Face model files are downloadable and onnx folder is present.

To replicate run

docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.5 --model-id BAAI/bge-m3

or

services:
    embeddings:
        image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5"
        command: --model-id BAAI/bge-m3
        ports:
          - "8080:80"

The resulting output is the following

2024-10-01T12:02:10.892818Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/**e-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "1e402b3ef386", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-10-01T12:02:10.893014Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-10-01T12:02:10.959512Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-10-01T12:02:12.155636Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-10-01T12:02:12.418430Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-10-01T12:02:12.418475Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-10-01T12:02:12.689863Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-10-01T12:02:15.212593Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:313: Downloading `model.onnx`
2024-10-01T12:02:15.337129Z  WARN download_artifacts: text_embeddings_backend: backends/src/lib.rs:317: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-m3/resolve/main/model.onnx)
2024-10-01T12:02:15.337216Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:318: Downloading `onnx/model.onnx`
2024-10-01T12:02:15.782935Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 3.364505011s
2024-10-01T12:02:16.281335Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 8192
2024-10-01T12:02:16.286095Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 4 tokenization workers
2024-10-01T12:02:17.421733Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: Failed to create ONNX Runtime session: Deserialize tensor 0.auto_model.encoder.layer.16.attention.output.LayerNorm.weight failed.GetFileLength for /data/models--BAAI--bge-m3/snapshots/5617a9f61b028005a4858fdac845db406aefb181/onnx/model.onnx_data failed:Invalid fd was supplied: -1

Checking the downloaded files I see the following blobs

-rw-r--r-- 1 root root   54 Oct  1 12:55 0140ba1eac83a3c9b857d64baba91969d988624b
-rw-r--r-- 1 root root  123 Oct  1 12:55 1fba91c78a6c8e17227058ab6d4d3acb5d8630a9
-rw-r--r-- 1 root root  17M Oct  1 12:55 21106b6d7dab2952c1d496fb21d5dc9db75c28ed361a05f5020bbba27810dd08
-rw-r--r-- 1 root root  191 Oct  1 12:55 9bd85925f325e25246d94c4918dc02ab98f2a1b7
-rw-r--r-- 1 root root  687 Oct  1 12:55 e6eda1c72da8f9dc30fdd9b69c73d35af3b7a7ad
-rw-r--r-- 1 root root 708K Oct  1 12:55 f84251230831afb359ab26d9fd37d5936d4d9bb5d1d5410e66442f630f24435b

Somehow in the onnx folder is also present a file named model.onnx_data that is probably missing from the download.

Expected behavior

The model is fully downloaded. If not a clear error should state that there was some kind of HTTP error. Maybe showing the expected download size and what has been downloaded so far will help.

avvertix avatar Oct 01 '24 13:10 avvertix