text-embeddings-inference
text-embeddings-inference copied to clipboard
sbert based mpnet model(related issue #33)
Model description
Hello! Thanks for this great work :)
Previously, I implemented mpnet-rs and found a related issue(feature request) #33
If there is no on-going work for the mpnet model, I want to refactor my mpnet-rs crate and add it to text-embeddings-inference.
Have a nice day :)
Open source status
- [X] The model implementation is available
- [X] The model weights are available
Provide useful links for the implementation
Hey, Thanks for the awesome effort to add a new model! I'm also interested in using this model with TEI. Did you receive any follow-up from the maintainers?
Hello! Thanks for your comment! I didn't get any follow up, but I'm just about to start to code. It might take some time to refactor. I'll open draft PR ASAP!
@NewBornRustacean Just wanted to know if you've made any progress on this?
Oh sorry man... I've been so busy with work lately that I haven't had a moment to spare. It's already been almost a month. I'm really sorry for keeping you waiting. I think I'll be able to get some work done either this weekend or next weekend.
Adding mpnet would be great if you have the bandwith.
@OlivierDehaene @NewBornRustacean @vrdn-23 i implemented the MPNet #363!
please check this PR when you are available :)
I checked the inference result on CPU env and it's (almost) identical to the output of the Transformer library (due to the activation). However, I didn't have any GPU, so not sure about the result on the GPU. (maybe it works too)
@kozistr awesome! I'll check it out on GPU. thanks 👍
@kozistr awesome! I'll check it out on GPU. thanks 👍
thank you!
@kozistr gm, thanks for your effort. just wanted to run your #363 PR on my A100 machines to build but it seems like backend-candle can't be built.
$ cargo install --path router -F candle-cuda -F http --no-default-features
...
error[E0599]: `&candle_core::Tensor` is not an iterator
--> backends/candle/src/models/mpnet.rs:169:53
|
169 | let attention_bias = attention_bias.map(|mask| mask.flatten(0, 1)).transpose()?;
| ^^^ `&candle_core::Tensor` is not an iterator
|
::: /root/.cargo/git/checkouts/candle-2c6db576e0f06e81/7e02ad8/candle-core/src/tensor.rs:23:1
|
23 | pub struct Tensor_ {
| ------------------ doesn't satisfy `candle_core::tensor::Tensor_: Iterator`
...
68 | pub struct Tensor(Arc<Tensor_>);
| ----------------- doesn't satisfy `candle_core::Tensor: Iterator`
|
= note: the following trait bounds were not satisfied:
`&candle_core::Tensor: Iterator`
which is required by `&mut &candle_core::Tensor: Iterator`
`candle_core::Tensor: Iterator`
which is required by `&mut candle_core::Tensor: Iterator`
`candle_core::tensor::Tensor_: Iterator`
which is required by `&mut candle_core::tensor::Tensor_: Iterator`
error[E0599]: `&candle_core::Tensor` is not an iterator
--> backends/candle/src/models/mpnet.rs:170:53
|
170 | let attention_mask = attention_mask.map(|mask| mask.flatten(0, 1)).transpose()?;
| ^^^ `&candle_core::Tensor` is not an iterator
|
::: /root/.cargo/git/checkouts/candle-2c6db576e0f06e81/7e02ad8/candle-core/src/tensor.rs:23:1
|
23 | pub struct Tensor_ {
| ------------------ doesn't satisfy `candle_core::tensor::Tensor_: Iterator`
...
68 | pub struct Tensor(Arc<Tensor_>);
| ----------------- doesn't satisfy `candle_core::Tensor: Iterator`
|
= note: the following trait bounds were not satisfied:
`&candle_core::Tensor: Iterator`
which is required by `&mut &candle_core::Tensor: Iterator`
`candle_core::Tensor: Iterator`
which is required by `&mut candle_core::Tensor: Iterator`
`candle_core::tensor::Tensor_: Iterator`
which is required by `&mut candle_core::tensor::Tensor_: Iterator`
error[E0308]: mismatched types
--> backends/candle/src/models/mpnet.rs:180:21
|
175 | let attention_scores = cublaslt.batch_matmul(
| ------------ arguments to this method are incorrect
...
180 | 1.0,
| ^^^ expected `Option<f32>`, found floating-point number
|
= note: expected enum `std::option::Option<f32>`
found type `{float}`
note: method defined here
--> backends/candle/src/layers/cublaslt.rs:89:12
|
89 | pub fn batch_matmul(
| ^^^^^^^^^^^^
...
95 | beta: Option<f32>,
| -----------------
help: try wrapping the expression in `Some`
|
180 | Some(1.0),
| +++++ +
Some errors have detailed explanations: E0308, E0599.
For more information about an error, try `rustc --explain E0308`.
I am using rustc 1.75.0, and 61581e6
commit from your forked repo. thanks!
@kozistr gm, thanks for your effort. just wanted to run your #363 PR on my A100 machines to build but it seems like backend-candle can't be built.
$ cargo install --path router -F candle-cuda -F http --no-default-features ... error[E0599]: `&candle_core::Tensor` is not an iterator --> backends/candle/src/models/mpnet.rs:169:53 | 169 | let attention_bias = attention_bias.map(|mask| mask.flatten(0, 1)).transpose()?; | ^^^ `&candle_core::Tensor` is not an iterator | ::: /root/.cargo/git/checkouts/candle-2c6db576e0f06e81/7e02ad8/candle-core/src/tensor.rs:23:1 | 23 | pub struct Tensor_ { | ------------------ doesn't satisfy `candle_core::tensor::Tensor_: Iterator` ... 68 | pub struct Tensor(Arc<Tensor_>); | ----------------- doesn't satisfy `candle_core::Tensor: Iterator` | = note: the following trait bounds were not satisfied: `&candle_core::Tensor: Iterator` which is required by `&mut &candle_core::Tensor: Iterator` `candle_core::Tensor: Iterator` which is required by `&mut candle_core::Tensor: Iterator` `candle_core::tensor::Tensor_: Iterator` which is required by `&mut candle_core::tensor::Tensor_: Iterator` error[E0599]: `&candle_core::Tensor` is not an iterator --> backends/candle/src/models/mpnet.rs:170:53 | 170 | let attention_mask = attention_mask.map(|mask| mask.flatten(0, 1)).transpose()?; | ^^^ `&candle_core::Tensor` is not an iterator | ::: /root/.cargo/git/checkouts/candle-2c6db576e0f06e81/7e02ad8/candle-core/src/tensor.rs:23:1 | 23 | pub struct Tensor_ { | ------------------ doesn't satisfy `candle_core::tensor::Tensor_: Iterator` ... 68 | pub struct Tensor(Arc<Tensor_>); | ----------------- doesn't satisfy `candle_core::Tensor: Iterator` | = note: the following trait bounds were not satisfied: `&candle_core::Tensor: Iterator` which is required by `&mut &candle_core::Tensor: Iterator` `candle_core::Tensor: Iterator` which is required by `&mut candle_core::Tensor: Iterator` `candle_core::tensor::Tensor_: Iterator` which is required by `&mut candle_core::tensor::Tensor_: Iterator` error[E0308]: mismatched types --> backends/candle/src/models/mpnet.rs:180:21 | 175 | let attention_scores = cublaslt.batch_matmul( | ------------ arguments to this method are incorrect ... 180 | 1.0, | ^^^ expected `Option<f32>`, found floating-point number | = note: expected enum `std::option::Option<f32>` found type `{float}` note: method defined here --> backends/candle/src/layers/cublaslt.rs:89:12 | 89 | pub fn batch_matmul( | ^^^^^^^^^^^^ ... 95 | beta: Option<f32>, | ----------------- help: try wrapping the expression in `Some` | 180 | Some(1.0), | +++++ + Some errors have detailed explanations: E0308, E0599. For more information about an error, try `rustc --explain E0308`.
I am using rustc 1.75.0, and
61581e6
commit from your forked repo. thanks!
hi Jin, good to see you here. thanks for the check! I just made a fix. could you please check the build is okay again if you are available?
Also, you can leave a review here!
- gotta try to build TEI on the Colab
checked working on the GPU (Colab T4) thanks for your helps :) @sigridjineth @NewBornRustacean
/content/text-embeddings-inference# ./target/release/text-embeddings-router --model-id sentence-transformers/all-mpnet-base-v2 --port 12345 --dtype float32
2024-08-03T12:27:23.840941Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "sen*****-************/***-*****-***e-v2", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "1e004704a735", port: 12345, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-08-03T12:27:23.841035Z INFO hf_hub: /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-08-03T12:27:23.931812Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-03T12:27:23.931919Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-03T12:27:23.931952Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-03T12:27:23.931961Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-03T12:27:23.931986Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-03T12:27:23.932018Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:328: Downloading `model.safetensors`
2024-08-03T12:27:23.932059Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 107.206µs
2024-08-03T12:27:23.945643Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 384
2024-08-03T12:27:23.945756Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 1 tokenization workers
2024-08-03T12:27:23.953234Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-08-03T12:27:24.408546Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:390: Starting MPNet model on Cuda(CudaDevice(DeviceId(1)))
2024-08-03T12:27:26.233824Z WARN text_embeddings_router: router/src/lib.rs:319: Invalid hostname, defaulting to 0.0.0.0
2024-08-03T12:27:26.237104Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1778: Starting HTTP server: 0.0.0.0:12345
2024-08-03T12:27:26.237197Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1779: Ready
2024-08-03T12:28:27.467762Z INFO embed{total_time="15.229546ms" tokenization_time="205.029µs" queue_time="317.319µs" inference_time="14.620051ms"}: text_embeddings_router::http::server: router/src/http/server.rs:706: Success
@kozistr working well on ampere A100 devices. thanks for your effort!
Compiling text-embeddings-core v1.5.0 (/workspace/sigrid/text-embeddings-inference/core)
Finished release [optimized] target(s) in 3m 05s
Installing /root/.cargo/bin/text-embeddings-router
Installed package `text-embeddings-router v1.5.0 (/workspace/sigrid/text-embeddings-inference/router)` (executable `text-embeddings-router`)
bash: pyenv: command not found
root@e9572049f4f6:/workspace/sigrid/text-embeddings-inference# ./target/release/text-embeddings-router --model-id sentence-transformers/all-mpnet-base-v2 --port 12345 --dtype float32
2024-08-05T01:40:27.036537Z INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "sen*****-************/***-*****-***e-v2", revision: None, tokenization_workers: None, dtype: Some(Float32), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "e9572049f4f6", port: 12345, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-08-05T01:40:27.126233Z INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:38: Downloading `1_Pooling/config.json`
2024-08-05T01:40:28.159694Z INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:62: Downloading `config_sentence_transformers.json`
2024-08-05T01:40:28.541071Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:21: Starting download
2024-08-05T01:40:28.541084Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:23: Downloading `config.json`
2024-08-05T01:40:28.925827Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Downloading `tokenizer.json`
2024-08-05T01:40:29.655061Z INFO download_artifacts: text_embeddings_backend: backends/src/lib.rs:328: Downloading `model.safetensors`
2024-08-05T01:40:35.321803Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:32: Model artifacts downloaded in 6.780728661s
2024-08-05T01:40:35.333659Z INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 384
2024-08-05T01:40:35.335509Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:28: Starting 128 tokenization workers
2024-08-05T01:40:35.817922Z INFO text_embeddings_router: router/src/lib.rs:241: Starting model backend
2024-08-05T01:40:38.027045Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:390: Starting MPNet model on Cuda(CudaDevice(DeviceId(1)))