ModelCache icon indicating copy to clipboard operation
ModelCache copied to clipboard

feat: support huggingface/text-embeddings-inference for faster embedding inference

Open liwenshipro opened this issue 1 year ago • 1 comments

Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. TEI implements many features such as:

  • No model graph compilation step
  • Metal support for local execution on Macs
  • Small docker images and fast boot times. Get ready for true serverless!
  • Token based dynamic batching
  • Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
  • Safetensors weight loading
  • Production ready (distributed tracing with Open Telemetry, Prometheus metrics)

This PR support TEI faster embedding inference with modelcache, the speedup is shown as follows: image

liwenshipro avatar May 24 '24 04:05 liwenshipro

Thank you for participating in the ModelCache open-source project; we welcome your involvement, and the addition of huggingface/text-embeddings-inference is a good idea. We offer two suggestions regarding your submission:

1 Using TextEmbeddingsInference as a class name and text_embeddings_inference as a variable name for LazyImport is somewhat generic, users may confuse concepts. It is recommended that names with greater distinction, such as HuggingfaceTEI or Huggingface_TEI, be used to enhance recognizability  2 Given the use of URL requests, it is recommended to add an example to the examples/embedding directory. I have already added the relevant directory, and you can pull the latest main branch to obtain it.

peng3307165 avatar May 24 '24 23:05 peng3307165

We have merged your commit into the main branch. Thank you for your contributions to the ModelCache project. Best wishes!

peng3307165 avatar Sep 14 '24 02:09 peng3307165