Sentence Transformers based mpnet models
Feature request
The Sentence Transformers based mpnet models are pretty popular for fast and cheap embeddings. It would be really helpful to support these, at a minimum those using the mpnet architecture, within the text embedding interface.
Motivation
As per sentence-transformers/multi-qa-mpnet-base-cos-v1 these models are still pretty popular. For anyone using them in production with a semantic search stack, it'd be preferable to support these models with a much faster inference stack than migrate to a new embedding model that might have different performance characteristics.
Your contribution
Once a CONTRIBUTING.md is written (it appears to not exist at the time of writing) I likely can find time to contribute to making this happen. I would benefit from some details about code organization and style preferences.
Ah geez, I saw there's a different category for this type of request. Sorry.
+1 from me. Since SentenceTransformers recommend all-mpnet-base-v2 I imagine lots of projects, like us, are using that model.
Currently trying to run the docker image with --model-id sentence-transformers/all-mpnet-base-v2 gives me
2023-10-25T13:37:12.148493Z INFO text_embeddings_router: router/src/main.rs:246: Starting model backend
Error: Could not create backend
Caused by:
Could not start backend: missing field `type_vocab_size` at line 23 column 1
Since SentenceTransformers recommend all-mpnet-base-v2 I imagine lots of projects, like us, are using that model.
Worth adding that all-mpnet-base-v2 is the top 1 model by downloads in the hub in the "Sentence Similarity" task, with over 10M downloads just last month: https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=downloads
@OlivierDehaene sorry for the direct ping, but could you share a spit ball guess on how hard would be to add support for all-mpnet-base-v2 ? Wondering if this is something we could try, wait or give up.
It should be ok to add. Candle and pytorch are not Iso in features but it should have everything to support it. You wont be able to use flash attention though so it will be slower than other models.
You wont be able to use flash attention though so it will be slower than other models.
Thanks for mentioning that. In that case it may be worth to bite the bullet and migrate now to BGE.
@OlivierDehaene can you throw some light on what needs to be done in order to enable any embedding model to be compatible with text-embeddings-inference ?
This will help the community to pitch in to grow the compatible models options in TEI.
Well the model needs to be ported from torch to candle which, depending on the model can take from one hour up to a day. We do not have an equivalent of AutoModel.
In the specific case of mpnet, the only thing that is missing is relative attention. It will be slow because it biases the QxK matrix therefore you cannot use flash attention.