llamafile icon indicating copy to clipboard operation
llamafile copied to clipboard

Is it possible to add an OpenAI-compatible embedding endpoint?

Open pamelafox opened this issue 1 year ago • 8 comments

I was wondering if there's any possibility of adding an OpenAI-compatible embedding endpoint, or if that wouldn't make sense with the underlying models.

My goal is to be able to swap out our OpenAI-using RAG chat with llamafile as an alternative option, and it works great for /chat/completions, but I also need /embedding to support vector searches.

Thanks!

pamelafox avatar Jan 04 '24 17:01 pamelafox

The server currently has its own /embedding API but it'd be nice to have an OpenAI compatible endpoint too. Anyone else who wants this, please leave a comment telling us so. This would also be a good thing to propose to the upstream project, since if they do this before us, then it'll get merged here the next time we do a sync.

jart avatar Jan 04 '24 22:01 jart

I'd very much appreciate this as well

zefhemel avatar Jan 05 '24 05:01 zefhemel

I think this would make so much sense to add and totally be in-line with Mozilla's "healthy internet" mission.

It would make it so much easier to adapt tools originally designed for only OpenAI to be useable in a local context too, helping grow an ecosystem around Llamafile.

mrchrisadams avatar Jan 05 '24 06:01 mrchrisadams

Totally agree. I have been playing with LangChain and interestingly it works with Llamafile. I got stuck when using 'OpenAIEmbeddings' from LangChain. It would be great to have OpenAI compliant embedding endpoint, so that many retrieval based tasks can be performed using Llamafile

code2k13 avatar Jan 05 '24 08:01 code2k13

Definitely would be a welcome addition, yes! :+1:

Edit: There is already an issue in 'llama.cpp' about lots of features that should go into the server, I added a comment there about this issue.

stolsvik avatar Jan 07 '24 23:01 stolsvik

The OpenAI-compatible embeddings-endpoint is directly mentioned here, I realize: https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-1858542650

stolsvik avatar Jan 07 '24 23:01 stolsvik

llama.cpp has this since 3 weeks now: https://github.com/ggerganov/llama.cpp/commit/c82d18e863fcde91b4b1109b1d0c73ea4470c405 so I guess it will be available the next time llama.cpp will be updated. You also have to pass "--embedding" to the server or it will generate empty vectors (https://github.com/ggerganov/llama.cpp/blob/5207b3fbc500f89dfe528693e96540956dbaed96/examples/server/README.md?plain=1#L35). Since about 2 weeks it should also support BERT models: https://github.com/ggerganov/llama.cpp/pull/5423 and I could convert & use https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 with the llama.cpp convert-hf-to-gguf.py script and use it just fine.

SC-CTS avatar Feb 20 '24 12:02 SC-CTS

Hi @jart @stlhood I wanted to follow up to see when the next sync release of llamafile might be available to include this fix on llama.cpp with openai embedding endpoints.

https://github.com/ggerganov/llama.cpp/pull/5190

alexanderchang1 avatar Mar 12 '24 02:03 alexanderchang1