llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

openai API `n` argument is ignored

Open BenjaminMarechalEVITECH opened this issue 10 months ago • 1 comments

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [X] I carefully followed the README.md.
  • [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

I'm running llama-server with following command:

python3 -m llama_cpp.server --model models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf --clip_model_path models/mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf --model_alias llava-v1.5-13b-q4_k --chat_format llava-1-5 --port 10322

(models downloaded from https://huggingface.co/mys/ggml_llava-v1.5-13b/tree/main)

When I call the server using openai python package:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:10322/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-no-key-required"
)

chat_completion = client.chat.completions.create(
    model="models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf",
    messages=[
        {"role": "user", "content": "Write a limerick about python exceptions"}
    ],
    n=2,
)
print(len(chat_completion.choices))  # Return 1, but should be 2.

According to OpenAI API, n argument is : "How many chat completion choices to generate for each input message." It's seems that it's ignored by the server.

Environment and Context

llama_cpp installed with pip install llama-cpp-python[server] print(llama_cpp.__version__): 0.3.6 print(openai.__version__): 1.59.7

BenjaminMarechalEVITECH avatar Jan 24 '25 20:01 BenjaminMarechalEVITECH