llama-cpp-python
llama-cpp-python copied to clipboard
openai API `n` argument is ignored
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X] I carefully followed the README.md.
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Current Behavior
I'm running llama-server with following command:
python3 -m llama_cpp.server --model models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf --clip_model_path models/mys/ggml_llava-v1.5-13b/mmproj-model-f16.gguf --model_alias llava-v1.5-13b-q4_k --chat_format llava-1-5 --port 10322
(models downloaded from https://huggingface.co/mys/ggml_llava-v1.5-13b/tree/main)
When I call the server using openai python package:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:10322/v1", # "http://<Your api-server IP>:port"
api_key = "sk-no-key-required"
)
chat_completion = client.chat.completions.create(
model="models/mys/ggml_llava-v1.5-13b/ggml-model-q4_k.gguf",
messages=[
{"role": "user", "content": "Write a limerick about python exceptions"}
],
n=2,
)
print(len(chat_completion.choices)) # Return 1, but should be 2.
According to OpenAI API, n argument is : "How many chat completion choices to generate for each input message."
It's seems that it's ignored by the server.
Environment and Context
llama_cpp installed with pip install llama-cpp-python[server]
print(llama_cpp.__version__): 0.3.6
print(openai.__version__): 1.59.7