llama.cpp Update main's interactive mode to use the chat handshake templates support already available in llama.cpp (and currently only used by server,...)

Currently the interactive mode of main doesnt add any tags to identify system or user messages to the model, by default.

One will have to either

use the seperate chatml mode to specifically work with chatml supported models.
or pass in-prefix, in-suffix and reverse-prompt arguments as required to try and match the required chatting template.

This PR tries to add a generic chat mode to main, which can make use of any chat templates already added to llama_chat_apply_template_internal, which is currently used by server logic, but not main logic.

To help with the same a new chaton.hpp file is added to common, which contains

llama_chat_apply_template_simple, which is a wrapper around llama_chat_apply_template(inturn internal) of lama.cpp
llama_chat_reverse_prompt which helps add any needed reverse prompts for the requested template standard

To add new chat handshake templates remember to add needed logic to

llama_chat_apply_template_internal (llama.cpp) and
llama_chat_reverse_prompt (common/chaton.hpp)

To use this support pass -i and --chaton TEMPLATE_ID to main. Currently supported templates is chatml and llama2, for other chat handshake template standards already support by chat_apply_template_internal, suitable reverse prompts need to be added to llama_chat_reverse_prompt.

Apr 20 '24 18:04 hanishkvc

Adding this attached patch to this PR, allows me to chat with llama3 also using main -i --chaton llama3

llamacpp-llama3-exp-v1.patch

Apr 20 '24 19:04 hanishkvc

This sounds like an excellent and much needed addition to main. Did you add a flag for specifying the system roles message?

Apr 21 '24 06:04 DifferentialityDevelopment

I've made a detailed research on the same subject, so I strongly recommend you to refer to this issue: https://github.com/ggerganov/llama.cpp/issues/6391

Also, a new function named llama_token_is_eog will be introduced with llama3 in the other PR, so it's better to wait

Apr 21 '24 09:04 ngxson

This sounds like an excellent and much needed addition to main. Did you add a flag for specifying the system roles message?

In interactive mode (ie -i) any prompt file (-f) or prompt (-p) passed using the command line argument is treated as a system prompt and inturn this PR formats it to match the system prompt template expected.

Apr 21 '24 16:04 hanishkvc

Here's the patch running llama3 with --verbose-prompt. I think there might be too many new lines?

main: prompt: '<|start_header_id|>system<|end_header_id|>
You are an assistant
<|eot_id|>

'
main: number of tokens in prompt = 11
128006 -> ''
  9125 -> 'system'
128007 -> ''
   198 -> '
'
  2675 -> 'You'
   527 -> ' are'
   459 -> ' an'
 18328 -> ' assistant'
   198 -> '
'
128009 -> ''
   271 -> '

'
main: static prompt based on n_keep: 'system
You are an assistant


'

main: interactive mode on.
Reverse prompt: '<|eot_id|>'
128009 -> ''

Without --verbose-prompt:

system
You are an assistant



>

Apr 21 '24 19:04 arch-btw

There is a new PR, which is again a experiment which tries to use a simple minded json file to try and drive the logic, so that many aspects can be controlled by editing the json file, rather than needed to update the code.

https://github.com/ggerganov/llama.cpp/pull/6834

Apr 22 '24 20:04 hanishkvc