llama-cpp-python Fix v1/chat/completions Gibberish API Responses

Fix v1/chat/completions Gibberish API Responses

Open keldenl opened this issue 1 year ago • 8 comments

The chat completion api specifically in fastapi wasn't doing a very consistent job in completing chat. The results seem to consistently generate gibberish (like \nA\n/imagine prompt: User is asking about , or just referencing to the system message in general), so I went ahead and tweaked the prompt (it was also weirdly formatted which probably confused the text generation even more).

Here it is before and after with the default example (running vicuna-13B unfiltered:

Before Prompt

 

### Instructions:Complete the following chat conversation between the user and the assistant. System messages should be strictly followed as additional instructions.

### Inputs:system None: You are a helpful assistant.
user None: What is the capital of France?

### Response:
assistant:

Results

{
  "id": "chatcmpl-8d9ce5a6-841d-4568-acbe-67ea9640954a",
  "object": "chat.completion",
  "created": 1680854923,
  "model": "../llama.cpp/models/vicuna/13B/ggml-vicuna-unfiltered-13b-4bit.bin",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "\nA\n/imagine prompt: User is asking about the capital of France, Assistant should provide a clear and concise answer, perhaps mentioning some interesting facts about the city or its history. The response should be friendly and helpful, using positive language and encouraging further questions. It should also include some basic information about Paris, such as its location in the north of France, its famous landmarks or cultural attractions, or its population and history.\n\n"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 70,
    "completion_tokens": 98,
    "total_tokens": 168
  }
}

After Prompt

### Instructions:
Complete the following chat conversation between the user and the assistant. System messages should be strictly followed as additional instructions.

system None: You are a helpful assistant.
user None: What is the capital of France?

### Response:
assistant:

Results

{
  "id": "chatcmpl-35a2850c-e9cd-445b-ad63-046cb98cb107",
  "object": "chat.completion",
  "created": 1680854743,
  "model": "../llama.cpp/models/vicuna/13B/ggml-vicuna-unfiltered-13b-4bit.bin",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": " The capital of France is Paris.\n"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 61,
    "completion_tokens": 12,
    "total_tokens": 73
  }
}

I also followed the general guidance around default parameters for chatting in https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_install_llama_8bit_and_4bit/ to help with results as well.

Also added some .gitignore things that were specific to macOS that helps with contributing.

Apr 07 '23 08:04 keldenl

@keldenl any hints where one could find an unfiltered vicuna grazing? asking for a friend ...

Apr 07 '23 08:04 jmtatsch

@keldenl any hints where one could find an unfiltered vicuna grazing? asking for a friend ...

hug.. some.. faces?

Apr 07 '23 09:04 keldenl

Thanks for the contribution I'll try to address this in a more general way with https://github.com/abetlen/llama-cpp-python/issues/17 by allowing you to load multiple models and set defaults based on the specific model

Apr 08 '23 00:04 abetlen

Also, I haven't tested out the vicuna model yet but it looks very promising, I've found using alpaca for chat is less than ideal.

Apr 08 '23 07:04 abetlen

Vicuña has given me some good results. I've tweaked the chat-ui (chatgpt clone with open ai api) and been able to run the fast api against it! the chat is pretty good other than the slower generation due to lack of chat mode :/

Apr 08 '23 07:04 keldenl

@keldenl awesome, yeah now that the mac install bugs are fixed improving chat speed is definitely next on my list

Apr 08 '23 07:04 abetlen

lmk if i can help in parallel in any way 😀

Apr 08 '23 07:04 keldenl

Related to this - currently the completion prompt returns gibberish if the system prompt "You are a helpful assistant." is not set. It would be great if this could be omitted, similar to the actual OpenAI API.

Apr 12 '23 10:04 Niek

Update?

May 23 '23 14:05 gjmulder

i think the issue is you now need to specify the chat_format correctly ... it won't guess anymore.

Nov 03 '23 20:11 earonesty

@earonesty correct, this is all handled correctly now by the chat format and chat handler APIs.

Nov 21 '23 09:11 abetlen

llama-cpp-python llama-cpp-python copied to clipboard

Fix v1/chat/completions Gibberish API Responses

llama-cpp-python
llama-cpp-python copied to clipboard