llama.cpp copied to clipboard
Embedding server crashes when used with langchain openai embeddings
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
If the bug concerns the server, please try to reproduce it first using the server test scenario framework.
The snippet:
from langchain_openai import OpenAIEmbeddings
embedding=OpenAIEmbeddings(model="-", api_key="sk-no-key-required", base_url="http://localhost:8666")
embedding.embed_documents(['hello there'])
Logs from server:
{"tid":"140695133081600","timestamp":1715435598,"level":"INFO","function":"update_slots","line":1807,"msg":"all slots are idle"}
{"tid":"140695133081600","timestamp":1715435604,"level":"INFO","function":"launch_slot_with_task","line":1036,"msg":"slot is processing task","id_slot":0,"id_task":0}
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_3::detail::type_error'
what(): [json.exception.type_error.302] type must be number, but is array
After that, server stops.
Server is launched in server-cuda
docker run --gpus all -v ./llm-gguf:/models -p 8666:8000 -e "CUDA_VISIBLE_DEVICES=2" local/llama.cpp:server-cuda -m /models/GritLM-7B-Q4_K_M.gguf --port 8000 --host --n-gpu-layers 32 --embeddings
When used with openai python api, everything is good:
import openai
client = openai.OpenAI(
api_key = "sk-no-key-required"
client.embeddings.create(input=['hello mister'], model='-').data[0].embedding
System: Ubuntu 20.04.6 LTS, NVIDIA A100-40GB
I have exactly the same bug. On a mac studio m1 max (last os). I'm using Hermes 2 Pro llama 8B model.
same issue with ./server -m bge-m3-f16.gguf --embedding (cpu only intel xeon E5-2650)
edit: similar to openai python api, running the same parameters with llamafile doesn't produce the error.
same issue with llama-3-8b-instruct.gguf converted from official llam3 8b model
This issue was closed because it has been inactive for 14 days since being marked as stale.