Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...)
Currently the interactive mode of main doesnt add any tags to identify system or user messages to the model, by default.
One will have to either
- use the seperate chatml mode to specifically work with chatml supported models.
- or pass in-prefix, in-suffix and reverse-prompt arguments as required to try and match the required chatting template.
This PR tries to add a generic chat mode to main, which can make use of any chat templates already added to llama_chat_apply_template_internal, which is currently used by server logic, but not main logic.
To help with the same a new chaton.hpp file is added to common, which contains
- llama_chat_apply_template_simple, which is a wrapper around llama_chat_apply_template(inturn internal) of lama.cpp
- llama_chat_reverse_prompt which helps add any needed reverse prompts for the requested template standard
To add new chat handshake templates remember to add needed logic to
- llama_chat_apply_template_internal (llama.cpp) and
- llama_chat_reverse_prompt (common/chaton.hpp)
To use this support pass -i and --chaton TEMPLATE_ID to main. Currently supported templates is chatml and llama2, for other chat handshake template standards already support by chat_apply_template_internal, suitable reverse prompts need to be added to llama_chat_reverse_prompt.
Adding this attached patch to this PR, allows me to chat with llama3 also using main -i --chaton llama3
This sounds like an excellent and much needed addition to main. Did you add a flag for specifying the system roles message?
I've made a detailed research on the same subject, so I strongly recommend you to refer to this issue: https://github.com/ggerganov/llama.cpp/issues/6391
Also, a new function named llama_token_is_eog will be introduced with llama3 in the other PR, so it's better to wait
This sounds like an excellent and much needed addition to main. Did you add a flag for specifying the system roles message?
In interactive mode (ie -i) any prompt file (-f) or prompt (-p) passed using the command line argument is treated as a system prompt and inturn this PR formats it to match the system prompt template expected.
Here's the patch running llama3 with --verbose-prompt. I think there might be too many new lines?
main: prompt: '<|start_header_id|>system<|end_header_id|>
You are an assistant
<|eot_id|>
'
main: number of tokens in prompt = 11
128006 -> ''
9125 -> 'system'
128007 -> ''
198 -> '
'
2675 -> 'You'
527 -> ' are'
459 -> ' an'
18328 -> ' assistant'
198 -> '
'
128009 -> ''
271 -> '
'
main: static prompt based on n_keep: 'system
You are an assistant
'
main: interactive mode on.
Reverse prompt: '<|eot_id|>'
128009 -> ''
Without --verbose-prompt:
system
You are an assistant
>
There is a new PR, which is again a experiment which tries to use a simple minded json file to try and drive the logic, so that many aspects can be controlled by editing the json file, rather than needed to update the code.
https://github.com/ggerganov/llama.cpp/pull/6834