text-embeddings-inference icon indicating copy to clipboard operation
text-embeddings-inference copied to clipboard

multilingual-e5-large exported by recent sentence-transformers version cannot be loaded

Open scriptator opened this issue 1 year ago • 2 comments

System Info

Tested TEI versions:

  • v1.2.0 (official Docker)
  • v1.2.3 (official Docker)
  • cc1c510 (current main, built on Ubuntu 23.10, cargo 1.75.0)

As it already fails during model loading, the hardware specs shouldn't be important.

Information

  • [X] Docker
  • [X] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

  1. Install a Python 3.11 venv with up-to-date sentence-transformers and tokenizers:
sentence-transformers==2.7.0
tokenizers==0.19.1
transformers==4.40.2
  1. Load intfloat/multilingual-e5-large and export it again to disk:
from sentence_transformers import SentenceTransformer
e5 = SentenceTransformer("intfloat/multilingual-e5-large")
e5.save("multilingual-e5-large")
  1. Run TEI on the exported model. The server does not start and emits the following:
tokenizer.json not found. text-embeddings-inference only supports fast tokenizers: Error("data did not match any variant of untagged enum PreTokenizerWrapper", line: 69, column: 3)
stack backtrace:
   0: rust_begin_unwind
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/result.rs:1653:5
   3: core::result::Result<T,E>::expect
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/result.rs:1034:23
   4: text_embeddings_router::run::{{closure}}
             at ./router/src/lib.rs:137:25
   5: text_embeddings_router::main::{{closure}}
             at ./router/src/main.rs:163:6
   6: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:63
   7: tokio::runtime::coop::with_budget
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:107:5
   8: tokio::runtime::coop::budget
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/coop.rs:73:5
   9: tokio::runtime::park::CachedParkThread::block_on
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/park.rs:281:31
  10: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/blocking.rs:66:9
  11: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:87:13
  12: tokio::runtime::context::runtime::enter_runtime
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/context/runtime.rs:65:16
  13: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/scheduler/multi_thread/mod.rs:86:9
  14: tokio::runtime::runtime::Runtime::block_on
             at /home/jvass/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/runtime.rs:351:45
  15: text_embeddings_router::main
             at ./router/src/main.rs:165:5
  16: core::ops::function::FnOnce::call_once
             at /build/rustc-mQ6oHL/rustc-1.75.0+dfsg0ubuntu1~bpo10/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Expected behavior

The server starts and works without a problem, just as if I run it on the hub version intfloat/multilingual-e5-large directly.

scriptator avatar May 15 '24 12:05 scriptator

Problem Analysis

The issue seems to be a breaking change in the tokenizers library (probably https://github.com/huggingface/tokenizers/pull/1476) which prevents an XLM-Roberta tokenizer saved with a version >= 0.19.0 to be loaded by older tokenizers versions.

Proposed solution: upgrading tokenizers to 0.19.1

That makes the server start again normally. I'd like to know from you whether that's sound or whether it would require other dependency upgrades?

scriptator avatar May 15 '24 13:05 scriptator

The pull request #266 fixes the problem (in the sense that the server can successfully load the new model again).

scriptator avatar May 15 '24 13:05 scriptator

Now that the fix for this has been merged, would it be possible to cut a new release? @OlivierDehaene

vrdn-23 avatar Jun 18 '24 18:06 vrdn-23

Yes I will cut a release today.

OlivierDehaene avatar Jun 21 '24 07:06 OlivierDehaene