llamafile Is it possible to add an OpenAI-compatible embedding endpoint?

I was wondering if there's any possibility of adding an OpenAI-compatible embedding endpoint, or if that wouldn't make sense with the underlying models.

My goal is to be able to swap out our OpenAI-using RAG chat with llamafile as an alternative option, and it works great for /chat/completions, but I also need /embedding to support vector searches.

Thanks!

Jan 04 '24 17:01 pamelafox

The server currently has its own /embedding API but it'd be nice to have an OpenAI compatible endpoint too. Anyone else who wants this, please leave a comment telling us so. This would also be a good thing to propose to the upstream project, since if they do this before us, then it'll get merged here the next time we do a sync.

Jan 04 '24 22:01 jart

I'd very much appreciate this as well

Jan 05 '24 05:01 zefhemel

I think this would make so much sense to add and totally be in-line with Mozilla's "healthy internet" mission.

It would make it so much easier to adapt tools originally designed for only OpenAI to be useable in a local context too, helping grow an ecosystem around Llamafile.

Jan 05 '24 06:01 mrchrisadams

Totally agree. I have been playing with LangChain and interestingly it works with Llamafile. I got stuck when using 'OpenAIEmbeddings' from LangChain. It would be great to have OpenAI compliant embedding endpoint, so that many retrieval based tasks can be performed using Llamafile

Jan 05 '24 08:01 code2k13

Definitely would be a welcome addition, yes! :+1:

Edit: There is already an issue in 'llama.cpp' about lots of features that should go into the server, I added a comment there about this issue.

Jan 07 '24 23:01 stolsvik

The OpenAI-compatible embeddings-endpoint is directly mentioned here, I realize: https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-1858542650

Jan 07 '24 23:01 stolsvik

llama.cpp has this since 3 weeks now: https://github.com/ggerganov/llama.cpp/commit/c82d18e863fcde91b4b1109b1d0c73ea4470c405 so I guess it will be available the next time llama.cpp will be updated. You also have to pass "--embedding" to the server or it will generate empty vectors (https://github.com/ggerganov/llama.cpp/blob/5207b3fbc500f89dfe528693e96540956dbaed96/examples/server/README.md?plain=1#L35). Since about 2 weeks it should also support BERT models: https://github.com/ggerganov/llama.cpp/pull/5423 and I could convert & use https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 with the llama.cpp convert-hf-to-gguf.py script and use it just fine.

Feb 20 '24 12:02 SC-CTS

Hi @jart @stlhood I wanted to follow up to see when the next sync release of llamafile might be available to include this fix on llama.cpp with openai embedding endpoints.

https://github.com/ggerganov/llama.cpp/pull/5190

Mar 12 '24 02:03 alexanderchang1

llamafile llamafile copied to clipboard

Is it possible to add an OpenAI-compatible embedding endpoint?

llamafile
llamafile copied to clipboard