llama.cpp Feature Request: Add support for chatglm3 in example server.

Prerequisites

[X] I am running the latest code. Mention the version if possible as well.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

ChatGLM3 uses a completely new prompt format. See https://github.com/THUDM/ChatGLM3/blob/main/PROMPT_en.md

I have created patch https://github.com/ggerganov/llama.cpp/commit/fd3492e85836c0df4b0404a47355159f4c349a44 for examples/server/public/prompt-formats.js

Motivation

Fixes chat errors, repetitions, and role reversals when playing with the example server.

Possible Implementation

From ChatGLM3 README:

Overall Structure

The format of the ChatGLM3 dialogue consists of several conversations, each of which contains a dialogue header and content. A typical multi-turn dialogue structure is as follows:

<|system|>
You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.
<|user|>
Hello
<|assistant|>
Hello, I'm ChatGLM3. What can I assist you today?

Aug 25 '24 04:08 themanyone

AFAIK support for gml3 and gml4 is already added: https://github.com/ggerganov/llama.cpp/pull/8031

Aug 27 '24 09:08 ngxson

Those are completely different files. That https://github.com/ggerganov/llama.cpp/pull/8031 was for the CLI version (which is also used/made into a server by some other projects like ollama). And the GGUF creation. This is for the gradio app server example that lets you choose a chat template when you run ./llama-server from the whisper.cpp github repo and navigate to http://localhost:port in the browser.

Aug 28 '24 04:08 themanyone

This issue was closed because it has been inactive for 14 days since being marked as stale.

Oct 12 '24 01:10 github-actions[bot]