llama.cpp icon indicating copy to clipboard operation
llama.cpp copied to clipboard

Embedding server crashes when used with langchain openai embeddings

Open voorhs opened this issue 9 months ago • 2 comments

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

If the bug concerns the server, please try to reproduce it first using the server test scenario framework.

The snippet:

from langchain_openai import OpenAIEmbeddings
embedding=OpenAIEmbeddings(model="-", api_key="sk-no-key-required", base_url="http://localhost:8666")
embedding.embed_documents(['hello there'])

Logs from server:

{"tid":"140695133081600","timestamp":1715435598,"level":"INFO","function":"update_slots","line":1807,"msg":"all slots are idle"}
{"tid":"140695133081600","timestamp":1715435604,"level":"INFO","function":"launch_slot_with_task","line":1036,"msg":"slot is processing task","id_slot":0,"id_task":0}
terminate called after throwing an instance of 'nlohmann::json_abi_v3_11_3::detail::type_error'
  what():  [json.exception.type_error.302] type must be number, but is array

After that, server stops.

Server is launched in server-cuda container:

docker run --gpus all -v ./llm-gguf:/models -p 8666:8000 -e "CUDA_VISIBLE_DEVICES=2" local/llama.cpp:server-cuda -m /models/GritLM-7B-Q4_K_M.gguf --port 8000 --host 0.0.0.0 --n-gpu-layers 32 --embeddings

When used with openai python api, everything is good:

import openai

client = openai.OpenAI(
    base_url="http://localhost:8666",
    api_key = "sk-no-key-required"
)

client.embeddings.create(input=['hello mister'], model='-').data[0].embedding

System: Ubuntu 20.04.6 LTS, NVIDIA A100-40GB

voorhs avatar May 11 '24 14:05 voorhs

I have exactly the same bug. On a mac studio m1 max (last os). I'm using Hermes 2 Pro llama 8B model.

stygmate avatar May 11 '24 14:05 stygmate

same issue with ./server -m bge-m3-f16.gguf --embedding (cpu only intel xeon E5-2650)

edit: similar to openai python api, running the same parameters with llamafile doesn't produce the error.

delphinebelnand avatar May 13 '24 12:05 delphinebelnand

same issue with llama-3-8b-instruct.gguf converted from official llam3 8b model

code-wangshuyi avatar May 17 '24 02:05 code-wangshuyi

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions[bot] avatar Jul 01 '24 01:07 github-actions[bot]