ModelCache feat: support huggingface/text-embeddings-inference for faster embedding inference

Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. TEI implements many features such as:

No model graph compilation step
Metal support for local execution on Macs
Small docker images and fast boot times. Get ready for true serverless!
Token based dynamic batching
Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
Safetensors weight loading
Production ready (distributed tracing with Open Telemetry, Prometheus metrics)

This PR support TEI faster embedding inference with modelcache, the speedup is shown as follows:

May 24 '24 04:05 liwenshipro

Thank you for participating in the ModelCache open-source project; we welcome your involvement, and the addition of huggingface/text-embeddings-inference is a good idea. We offer two suggestions regarding your submission:

1 Using TextEmbeddingsInference as a class name and text_embeddings_inference as a variable name for LazyImport is somewhat generic, users may confuse concepts. It is recommended that names with greater distinction, such as HuggingfaceTEI or Huggingface_TEI, be used to enhance recognizability 2 Given the use of URL requests, it is recommended to add an example to the examples/embedding directory. I have already added the relevant directory, and you can pull the latest main branch to obtain it.

May 24 '24 23:05 peng3307165

We have merged your commit into the main branch. Thank you for your contributions to the ModelCache project. Best wishes！

Sep 14 '24 02:09 peng3307165