llama-cpp-python icon indicating copy to clipboard operation
llama-cpp-python copied to clipboard

Fix v1/chat/completions Gibberish API Responses

Open keldenl opened this issue 1 year ago • 8 comments

The chat completion api specifically in fastapi wasn't doing a very consistent job in completing chat. The results seem to consistently generate gibberish (like ​\nA\n/imagine prompt: User is asking about , or just referencing to the system message in general), so I went ahead and tweaked the prompt (it was also weirdly formatted which probably confused the text generation even more).

Here it is before and after with the default example (running vicuna-13B unfiltered:

Before Prompt

 

### Instructions:Complete the following chat conversation between the user and the assistant. System messages should be strictly followed as additional instructions.

### Inputs:system None: You are a helpful assistant.
user None: What is the capital of France?

### Response:
assistant: 

Results

{
  "id": "chatcmpl-8d9ce5a6-841d-4568-acbe-67ea9640954a",
  "object": "chat.completion",
  "created": 1680854923,
  "model": "../llama.cpp/models/vicuna/13B/ggml-vicuna-unfiltered-13b-4bit.bin",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "​\nA\n/imagine prompt: User is asking about the capital of France, Assistant should provide a clear and concise answer, perhaps mentioning some interesting facts about the city or its history. The response should be friendly and helpful, using positive language and encouraging further questions. It should also include some basic information about Paris, such as its location in the north of France, its famous landmarks or cultural attractions, or its population and history.\n\n"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 70,
    "completion_tokens": 98,
    "total_tokens": 168
  }
}

After Prompt

### Instructions:
Complete the following chat conversation between the user and the assistant. System messages should be strictly followed as additional instructions.

system None: You are a helpful assistant.
user None: What is the capital of France?

### Response:
assistant:

Results

{
  "id": "chatcmpl-35a2850c-e9cd-445b-ad63-046cb98cb107",
  "object": "chat.completion",
  "created": 1680854743,
  "model": "../llama.cpp/models/vicuna/13B/ggml-vicuna-unfiltered-13b-4bit.bin",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": " The capital of France is Paris.\n"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 61,
    "completion_tokens": 12,
    "total_tokens": 73
  }
}

I also followed the general guidance around default parameters for chatting in https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_install_llama_8bit_and_4bit/ to help with results as well.

Also added some .gitignore things that were specific to macOS that helps with contributing.

keldenl avatar Apr 07 '23 08:04 keldenl

@keldenl any hints where one could find an unfiltered vicuna grazing? asking for a friend ...

jmtatsch avatar Apr 07 '23 08:04 jmtatsch

@keldenl any hints where one could find an unfiltered vicuna grazing? asking for a friend ...

hug.. some.. faces?

keldenl avatar Apr 07 '23 09:04 keldenl

Thanks for the contribution I'll try to address this in a more general way with https://github.com/abetlen/llama-cpp-python/issues/17 by allowing you to load multiple models and set defaults based on the specific model

abetlen avatar Apr 08 '23 00:04 abetlen

Also, I haven't tested out the vicuna model yet but it looks very promising, I've found using alpaca for chat is less than ideal.

abetlen avatar Apr 08 '23 07:04 abetlen

Vicuña has given me some good results. I've tweaked the chat-ui (chatgpt clone with open ai api) and been able to run the fast api against it! the chat is pretty good other than the slower generation due to lack of chat mode :/

keldenl avatar Apr 08 '23 07:04 keldenl

@keldenl awesome, yeah now that the mac install bugs are fixed improving chat speed is definitely next on my list

abetlen avatar Apr 08 '23 07:04 abetlen

lmk if i can help in parallel in any way 😀

keldenl avatar Apr 08 '23 07:04 keldenl

Related to this - currently the completion prompt returns gibberish if the system prompt "You are a helpful assistant." is not set. It would be great if this could be omitted, similar to the actual OpenAI API.

Niek avatar Apr 12 '23 10:04 Niek

Update?

gjmulder avatar May 23 '23 14:05 gjmulder

i think the issue is you now need to specify the chat_format correctly ... it won't guess anymore.

earonesty avatar Nov 03 '23 20:11 earonesty

@earonesty correct, this is all handled correctly now by the chat format and chat handler APIs.

abetlen avatar Nov 21 '23 09:11 abetlen