text-embeddings-inference icon indicating copy to clipboard operation
text-embeddings-inference copied to clipboard

sbert based mpnet model(related issue #33)

Open NewBornRustacean opened this issue 10 months ago • 12 comments

Model description

Hello! Thanks for this great work :)

Previously, I implemented mpnet-rs and found a related issue(feature request) #33

If there is no on-going work for the mpnet model, I want to refactor my mpnet-rs crate and add it to text-embeddings-inference.

Have a nice day :)

Open source status

  • [X] The model implementation is available
  • [X] The model weights are available

Provide useful links for the implementation

NewBornRustacean avatar Apr 25 '24 09:04 NewBornRustacean

Hey, Thanks for the awesome effort to add a new model! I'm also interested in using this model with TEI. Did you receive any follow-up from the maintainers?

vrdn-23 avatar May 03 '24 19:05 vrdn-23

Hello! Thanks for your comment! I didn't get any follow up, but I'm just about to start to code. It might take some time to refactor. I'll open draft PR ASAP!

NewBornRustacean avatar May 03 '24 20:05 NewBornRustacean

@NewBornRustacean Just wanted to know if you've made any progress on this?

vrdn-23 avatar May 29 '24 21:05 vrdn-23

Oh sorry man... I've been so busy with work lately that I haven't had a moment to spare. It's already been almost a month. I'm really sorry for keeping you waiting. I think I'll be able to get some work done either this weekend or next weekend.

NewBornRustacean avatar May 29 '24 21:05 NewBornRustacean

Adding mpnet would be great if you have the bandwith.

OlivierDehaene avatar Jun 17 '24 14:06 OlivierDehaene

@OlivierDehaene @NewBornRustacean @vrdn-23 i implemented the MPNet #363!

please check this PR when you are available :)

I checked the inference result on CPU env and it's (almost) identical to the output of the Transformer library (due to the activation). However, I didn't have any GPU, so not sure about the result on the GPU. (maybe it works too)

kozistr avatar Jul 31 '24 01:07 kozistr

@kozistr awesome! I'll check it out on GPU. thanks 👍

NewBornRustacean avatar Jul 31 '24 02:07 NewBornRustacean

@kozistr awesome! I'll check it out on GPU. thanks 👍

thank you!

kozistr avatar Jul 31 '24 02:07 kozistr

@kozistr gm, thanks for your effort. just wanted to run your #363 PR on my A100 machines to build but it seems like backend-candle can't be built.

$ cargo install --path router -F candle-cuda -F http --no-default-features
...
error[E0599]: `&candle_core::Tensor` is not an iterator
   --> backends/candle/src/models/mpnet.rs:169:53
    |
169 |                 let attention_bias = attention_bias.map(|mask| mask.flatten(0, 1)).transpose()?;
    |                                                     ^^^ `&candle_core::Tensor` is not an iterator
    |
   ::: /root/.cargo/git/checkouts/candle-2c6db576e0f06e81/7e02ad8/candle-core/src/tensor.rs:23:1
    |
23  | pub struct Tensor_ {
    | ------------------ doesn't satisfy `candle_core::tensor::Tensor_: Iterator`
...
68  | pub struct Tensor(Arc<Tensor_>);
    | ----------------- doesn't satisfy `candle_core::Tensor: Iterator`
    |
    = note: the following trait bounds were not satisfied:
            `&candle_core::Tensor: Iterator`
            which is required by `&mut &candle_core::Tensor: Iterator`
            `candle_core::Tensor: Iterator`
            which is required by `&mut candle_core::Tensor: Iterator`
            `candle_core::tensor::Tensor_: Iterator`
            which is required by `&mut candle_core::tensor::Tensor_: Iterator`

error[E0599]: `&candle_core::Tensor` is not an iterator
   --> backends/candle/src/models/mpnet.rs:170:53
    |
170 |                 let attention_mask = attention_mask.map(|mask| mask.flatten(0, 1)).transpose()?;
    |                                                     ^^^ `&candle_core::Tensor` is not an iterator
    |
   ::: /root/.cargo/git/checkouts/candle-2c6db576e0f06e81/7e02ad8/candle-core/src/tensor.rs:23:1
    |
23  | pub struct Tensor_ {
    | ------------------ doesn't satisfy `candle_core::tensor::Tensor_: Iterator`
...
68  | pub struct Tensor(Arc<Tensor_>);
    | ----------------- doesn't satisfy `candle_core::Tensor: Iterator`
    |
    = note: the following trait bounds were not satisfied:
            `&candle_core::Tensor: Iterator`
            which is required by `&mut &candle_core::Tensor: Iterator`
            `candle_core::Tensor: Iterator`
            which is required by `&mut candle_core::Tensor: Iterator`
            `candle_core::tensor::Tensor_: Iterator`
            which is required by `&mut candle_core::tensor::Tensor_: Iterator`

error[E0308]: mismatched types
   --> backends/candle/src/models/mpnet.rs:180:21
    |
175 |                 let attention_scores = cublaslt.batch_matmul(
    |                                                 ------------ arguments to this method are incorrect
...
180 |                     1.0,
    |                     ^^^ expected `Option<f32>`, found floating-point number
    |
    = note: expected enum `std::option::Option<f32>`
               found type `{float}`
note: method defined here
   --> backends/candle/src/layers/cublaslt.rs:89:12
    |
89  |     pub fn batch_matmul(
    |            ^^^^^^^^^^^^
...
95  |         beta: Option<f32>,
    |         -----------------
help: try wrapping the expression in `Some`
    |
180 |                     Some(1.0),
    |                     +++++   +

Some errors have detailed explanations: E0308, E0599.
For more information about an error, try `rustc --explain E0308`.

I am using rustc 1.75.0, and 61581e6 commit from your forked repo. thanks!

sigridjineth avatar Aug 03 '24 09:08 sigridjineth

@kozistr gm, thanks for your effort. just wanted to run your #363 PR on my A100 machines to build but it seems like backend-candle can't be built.

$ cargo install --path router -F candle-cuda -F http --no-default-features
...
error[E0599]: `&candle_core::Tensor` is not an iterator
   --> backends/candle/src/models/mpnet.rs:169:53
    |
169 |                 let attention_bias = attention_bias.map(|mask| mask.flatten(0, 1)).transpose()?;
    |                                                     ^^^ `&candle_core::Tensor` is not an iterator
    |
   ::: /root/.cargo/git/checkouts/candle-2c6db576e0f06e81/7e02ad8/candle-core/src/tensor.rs:23:1
    |
23  | pub struct Tensor_ {
    | ------------------ doesn't satisfy `candle_core::tensor::Tensor_: Iterator`
...
68  | pub struct Tensor(Arc<Tensor_>);
    | ----------------- doesn't satisfy `candle_core::Tensor: Iterator`
    |
    = note: the following trait bounds were not satisfied:
            `&candle_core::Tensor: Iterator`
            which is required by `&mut &candle_core::Tensor: Iterator`
            `candle_core::Tensor: Iterator`
            which is required by `&mut candle_core::Tensor: Iterator`
            `candle_core::tensor::Tensor_: Iterator`
            which is required by `&mut candle_core::tensor::Tensor_: Iterator`

error[E0599]: `&candle_core::Tensor` is not an iterator
   --> backends/candle/src/models/mpnet.rs:170:53
    |
170 |                 let attention_mask = attention_mask.map(|mask| mask.flatten(0, 1)).transpose()?;
    |                                                     ^^^ `&candle_core::Tensor` is not an iterator
    |
   ::: /root/.cargo/git/checkouts/candle-2c6db576e0f06e81/7e02ad8/candle-core/src/tensor.rs:23:1
    |
23  | pub struct Tensor_ {
    | ------------------ doesn't satisfy `candle_core::tensor::Tensor_: Iterator`
...
68  | pub struct Tensor(Arc<Tensor_>);
    | ----------------- doesn't satisfy `candle_core::Tensor: Iterator`
    |
    = note: the following trait bounds were not satisfied:
            `&candle_core::Tensor: Iterator`
            which is required by `&mut &candle_core::Tensor: Iterator`
            `candle_core::Tensor: Iterator`
            which is required by `&mut candle_core::Tensor: Iterator`
            `candle_core::tensor::Tensor_: Iterator`
            which is required by `&mut candle_core::tensor::Tensor_: Iterator`

error[E0308]: mismatched types
   --> backends/candle/src/models/mpnet.rs:180:21
    |
175 |                 let attention_scores = cublaslt.batch_matmul(
    |                                                 ------------ arguments to this method are incorrect
...
180 |                     1.0,
    |                     ^^^ expected `Option<f32>`, found floating-point number
    |
    = note: expected enum `std::option::Option<f32>`
               found type `{float}`
note: method defined here
   --> backends/candle/src/layers/cublaslt.rs:89:12
    |
89  |     pub fn batch_matmul(
    |            ^^^^^^^^^^^^
...
95  |         beta: Option<f32>,
    |         -----------------
help: try wrapping the expression in `Some`
    |
180 |                     Some(1.0),
    |                     +++++   +

Some errors have detailed explanations: E0308, E0599.
For more information about an error, try `rustc --explain E0308`.

I am using rustc 1.75.0, and 61581e6 commit from your forked repo. thanks!

hi Jin, good to see you here. thanks for the check! I just made a fix. could you please check the build is okay again if you are available?

Also, you can leave a review here!

  • gotta try to build TEI on the Colab

kozistr avatar Aug 03 '24 10:08 kozistr

checked working on the GPU (Colab T4) thanks for your helps :) @sigridjineth @NewBornRustacean

/content/text-embeddings-inference# ./target/release/text-embeddings-router --model-id sentence-transformers/all-mpnet-base-v2 --port 12345 --dtype float32
2024-08-03T12:27:23.840941Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "sen*****-************/***-*****-***e-v2", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "1e004704a735", port: 12345, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-08-03T12:27:23.841035Z  INFO hf_hub: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-08-03T12:27:23.931812Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-03T12:27:23.931919Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-03T12:27:23.931952Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-03T12:27:23.931961Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-03T12:27:23.931986Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-03T12:27:23.932018Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:328: Downloading `model.safetensors`
2024-08-03T12:27:23.932059Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 107.206µs
2024-08-03T12:27:23.945643Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 384
2024-08-03T12:27:23.945756Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 1 tokenization workers
2024-08-03T12:27:23.953234Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-08-03T12:27:24.408546Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:390: Starting MPNet model on Cuda(CudaDevice(DeviceId(1)))
2024-08-03T12:27:26.233824Z  WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
2024-08-03T12:27:26.237104Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:12345
2024-08-03T12:27:26.237197Z  INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
2024-08-03T12:28:27.467762Z  INFO embed{total_time="15.229546ms" tokenization_time="205.029µs" queue_time="317.319µs" inference_time="14.620051ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success

kozistr avatar Aug 03 '24 12:08 kozistr

@kozistr working well on ampere A100 devices. thanks for your effort!

   Compiling text-embeddings-core v1.5.0 (/workspace/sigrid/text-embeddings-inference/core)
    Finished release [optimized] target(s) in 3m 05s
  Installing /root/.cargo/bin/text-embeddings-router
   Installed package `text-embeddings-router v1.5.0 (/workspace/sigrid/text-embeddings-inference/router)` (executable `text-embeddings-router`)
bash: pyenv: command not found
root@e9572049f4f6:/workspace/sigrid/text-embeddings-inference# ./target/release/text-embeddings-router --model-id sentence-transformers/all-mpnet-base-v2 --port 12345 --dtype float32
2024-08-05T01:40:27.036537Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "sen*****-************/***-*****-***e-v2", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "e9572049f4f6", port: 12345, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-08-05T01:40:27.126233Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-05T01:40:28.159694Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-05T01:40:28.541071Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-05T01:40:28.541084Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-05T01:40:28.925827Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-05T01:40:29.655061Z  INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:328: Downloading `model.safetensors`
2024-08-05T01:40:35.321803Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 6.780728661s
2024-08-05T01:40:35.333659Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 384
2024-08-05T01:40:35.335509Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 128 tokenization workers
2024-08-05T01:40:35.817922Z  INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-08-05T01:40:38.027045Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:390: Starting MPNet model on Cuda(CudaDevice(DeviceId(1)))

sigridjineth avatar Aug 05 '24 01:08 sigridjineth