text-embeddings-inference multilingual-e5-large exported by recent sentence-transformers version cannot be loaded

System Info

Tested TEI versions:

v1.2.0 (official Docker)
v1.2.3 (official Docker)
cc1c510 (current main, built on Ubuntu 23.10, cargo 1.75.0)

As it already fails during model loading, the hardware specs shouldn't be important.

Information

[X] Docker
[X] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

Install a Python 3.11 venv with up-to-date sentence-transformers and tokenizers:

sentence-transformers==2.7.0
tokenizers==0.19.1
transformers==4.40.2

Load intfloat/multilingual-e5-large and export it again to disk:

from sentence_transformers import SentenceTransformer
e5 = SentenceTransformer("intfloat/multilingual-e5-large")
e5.save("multilingual-e5-large")

Run TEI on the exported model. The server does not start and emits the following:

tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum PreTokenizerWrapper", line: 69, column: 3)
stack backtrace:
   0: rust_begin_unwind
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/result.rs:1653:5
   3: core::result::Result<T,E>::expect
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/result.rs:1034:23
   4: text_embeddings_router::run::{{closure}}
             at ./router/src/lib.rs:137:25
   5: text_embeddings_router::main::{{closure}}
             at ./router/src/main.rs:163:6
   6: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:63
   7: tokio::runtime::coop::with_budget
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:107:5
   8: tokio::runtime::coop::budget
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:73:5
   9: tokio::runtime::park::CachedParkThread::block_on
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:31
  10: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/blocking.rs:66:9
  11: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:87:13
  12: tokio::runtime::context::runtime::enter_runtime
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/runtime.rs:65:16
  13: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:86:9
  14: tokio::runtime::runtime::Runtime::block_on
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/runtime.rs:351:45
  15: text_embeddings_router::main
             at ./router/src/main.rs:165:5
  16: core::ops::function::FnOnce::call_once
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Expected behavior

The server starts and works without a problem, just as if I run it on the hub version intfloat/multilingual-e5-large directly.

May 15 '24 12:05 scriptator

Problem Analysis

The issue seems to be a breaking change in the tokenizers library (probably https://github.com/huggingface/tokenizers/pull/1476) which prevents an XLM-Roberta tokenizer saved with a version >= 0.19.0 to be loaded by older tokenizers versions.

Proposed solution: upgrading tokenizers to 0.19.1

That makes the server start again normally. I'd like to know from you whether that's sound or whether it would require other dependency upgrades?

May 15 '24 13:05 scriptator

The pull request #266 fixes the problem (in the sense that the server can successfully load the new model again).

May 15 '24 13:05 scriptator

Now that the fix for this has been merged, would it be possible to cut a new release? @OlivierDehaene

Jun 18 '24 18:06 vrdn-23

Yes I will cut a release today.

Jun 21 '24 07:06 OlivierDehaene

text-embeddings-inference text-embeddings-inference copied to clipboard

multilingual-e5-large exported by recent sentence-transformers version cannot be loaded

System Info

Information

Tasks

Reproduction

Expected behavior

Problem Analysis

text-embeddings-inference
text-embeddings-inference copied to clipboard