text-embeddings-inference
text-embeddings-inference copied to clipboard
multilingual-e5-large exported by recent sentence-transformers version cannot be loaded
System Info
Tested TEI versions:
- v1.2.0 (official Docker)
- v1.2.3 (official Docker)
- cc1c510 (current main, built on Ubuntu 23.10, cargo 1.75.0)
As it already fails during model loading, the hardware specs shouldn't be important.
Information
- [X] Docker
- [X] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
- Install a Python 3.11 venv with up-to-date sentence-transformers and tokenizers:
sentence-transformers==2.7.0
tokenizers==0.19.1
transformers==4.40.2
- Load
intfloat/multilingual-e5-largeand export it again to disk:
from sentence_transformers import SentenceTransformer
e5 = SentenceTransformer("intfloat/multilingual-e5-large")
e5.save("multilingual-e5-large")
- Run TEI on the exported model. The server does not start and emits the following:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum PreTokenizerWrapper", line: 69, column: 3)
stack backtrace:
0: rust_begin_unwind
at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/panicking.rs:72:14
2: core::result::unwrap_failed
at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/result.rs:1653:5
3: core::result::Result<T,E>::expect
at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/result.rs:1034:23
4: text_embeddings_router::run::{{closure}}
at ./router/src/lib.rs:137:25
5: text_embeddings_router::main::{{closure}}
at ./router/src/main.rs:163:6
6: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:63
7: tokio::runtime::coop::with_budget
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:107:5
8: tokio::runtime::coop::budget
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:73:5
9: tokio::runtime::park::CachedParkThread::block_on
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:31
10: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/blocking.rs:66:9
11: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:87:13
12: tokio::runtime::context::runtime::enter_runtime
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/runtime.rs:65:16
13: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:86:9
14: tokio::runtime::runtime::Runtime::block_on
at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/runtime.rs:351:45
15: text_embeddings_router::main
at ./router/src/main.rs:165:5
16: core::ops::function::FnOnce::call_once
at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Expected behavior
The server starts and works without a problem, just as if I run it on the hub version intfloat/multilingual-e5-large directly.
Problem Analysis
The issue seems to be a breaking change in the tokenizers library (probably https://github.com/huggingface/tokenizers/pull/1476) which prevents an XLM-Roberta tokenizer saved with a version >= 0.19.0 to be loaded by older tokenizers versions.
Proposed solution: upgrading tokenizers to 0.19.1
That makes the server start again normally. I'd like to know from you whether that's sound or whether it would require other dependency upgrades?
The pull request #266 fixes the problem (in the sense that the server can successfully load the new model again).
Now that the fix for this has been merged, would it be possible to cut a new release? @OlivierDehaene
Yes I will cut a release today.