text-embeddings-inference Sentence Transformers based mpnet models

Feature request

The Sentence Transformers based mpnet models are pretty popular for fast and cheap embeddings. It would be really helpful to support these, at a minimum those using the mpnet architecture, within the text embedding interface.

Motivation

As per sentence-transformers/multi-qa-mpnet-base-cos-v1 these models are still pretty popular. For anyone using them in production with a semantic search stack, it'd be preferable to support these models with a much faster inference stack than migrate to a new embedding model that might have different performance characteristics.

Your contribution

Once a CONTRIBUTING.md is written (it appears to not exist at the time of writing) I likely can find time to contribute to making this happen. I would benefit from some details about code organization and style preferences.

Oct 23 '23 02:10 fozziethebeat

Ah geez, I saw there's a different category for this type of request. Sorry.

Oct 23 '23 02:10 fozziethebeat

+1 from me. Since SentenceTransformers recommend all-mpnet-base-v2 I imagine lots of projects, like us, are using that model.

Currently trying to run the docker image with --model-id sentence-transformers/all-mpnet-base-v2 gives me

2023-10-25T13:37:12.148493Z  INFO text_embeddings_router: router/src/main.rs:246: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: missing field `type_vocab_size` at line 23 column 1

Oct 25 '23 13:10 xfalcox

Since SentenceTransformers recommend all-mpnet-base-v2 I imagine lots of projects, like us, are using that model.

Worth adding that all-mpnet-base-v2 is the top 1 model by downloads in the hub in the "Sentence Similarity" task, with over 10M downloads just last month: https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=downloads

Oct 26 '23 13:10 xfalcox

@OlivierDehaene sorry for the direct ping, but could you share a spit ball guess on how hard would be to add support for all-mpnet-base-v2 ? Wondering if this is something we could try, wait or give up.

Nov 16 '23 19:11 xfalcox

It should be ok to add. Candle and pytorch are not Iso in features but it should have everything to support it. You wont be able to use flash attention though so it will be slower than other models.

Nov 17 '23 07:11 OlivierDehaene

You wont be able to use flash attention though so it will be slower than other models.

Thanks for mentioning that. In that case it may be worth to bite the bullet and migrate now to BGE.

Nov 17 '23 12:11 xfalcox

@OlivierDehaene can you throw some light on what needs to be done in order to enable any embedding model to be compatible with text-embeddings-inference ? This will help the community to pitch in to grow the compatible models options in TEI.

Nov 28 '23 18:11 ashokrajab

Well the model needs to be ported from torch to candle which, depending on the model can take from one hour up to a day. We do not have an equivalent of AutoModel.

In the specific case of mpnet, the only thing that is missing is relative attention. It will be slow because it biases the QxK matrix therefore you cannot use flash attention.

Nov 29 '23 08:11 OlivierDehaene