Get chat_template from a server endpoint.

Open lastrosade opened this issue 4 months ago • 9 comments

Feature Description

Retrieve the "chat_template" field from the GGUF model in the /props endpoint.

Motivation

Many models incorporate a jinja2 template stored in a field called chat_template. This feature would enable users to generate appropriate templates with their scripts.

This could also probably be used on the web UI to autofill the template text boxes.

Feb 10 '24 23:02 lastrosade

Could be nice to just return all the gguf metadata in one go?

Feb 11 '24 20:02 Azeirah

I need this too. Currently, the problem is that we cannot access to metadata outside of llama_model_loader (please correct me if I'm wrong)

Feb 14 '24 11:02 ngxson

There are functions in the llama.h API to read the metadata. It should work with any non-array metadata.

https://github.com/ggerganov/llama.cpp/blob/8084d554406b767d36b3250b3b787462d5dd626f/llama.h#L357-L367

Feb 14 '24 11:02 slaren

@slaren Perfect, thanks. That's exactly what I was missing in https://github.com/ggerganov/llama.cpp/pull/5425

I'm not sure how can we decode the template inside cpp code. It would be far more complicated to include some kind of "official" parser.

The idea that I'm having in my mind is maybe hard code some template patterns to detect if it's which type of template. In reality, we will mostly have either llama2 format ([INST]) or chatml (<|im_start|>)

Feb 15 '24 18:02 ngxson

The idea that I'm having in my mind is maybe hard code some template patterns to detect if it's which type of template. In reality, we will mostly have either llama2 format ([INST]) or chatml (<|im_start|>)

Yes, exactly. Some simple heuristic checks to detect the most common templates would be great. Should be something very basic and easy to reuse - no need to over-engineer it.

Feb 16 '24 09:02 ggerganov

Would that work for weirder templates like MiniCPM's

<用户>
<AI>

Feb 21 '24 18:02 lastrosade

Would that work for weirder templates like MiniCPM's
<用户>
<AI>
?

No, not for now, but we can add support for these template as long as we can find the jinja version.

I couldn't find this template in tokenizer_config.json of MiniCPM-V. Can you find it somewhere?

For now, we can only support templates that are included in tokenizer_config.json. The benefit is that I can run the python code then cpp code to compare if the cpp implementation is correct or not.

Feb 21 '24 19:02 ngxson

I couldn't find this template in tokenizer_config.json of MiniCPM-V. Can you find it somewhere?

I took mine from here https://github.com/ggerganov/llama.cpp/issues/5447#issuecomment-1957784407

{% for message in messages %}{% if message['role'] == 'user' %}{{'<用户>' + message['content'].strip() + '<AI>'}}{% else %}{{message['content'].strip()}}{% endif %}{% endfor %}

Feb 22 '24 05:02 lastrosade

@lastrosade Can you give the link to the official docs somewhere? Pay attention because template may different structure of newline & space & EOS / BOS token that is quite confused.

Your template will output something like: <用户>hello<AI>hi

But in reality, it may be: <用户>hello\n<AI>hi, <用户>\nhello\n</s><s><AI>\nhi,...

That's why it's always better to have the official template (the one used in training process)

Feb 22 '24 09:02 ngxson

I don't know where to find any official docs, But looking at their repo, it seems that they do not use any special tokens in their template.

https://github.com/OpenBMB/MiniCPM/blob/b3358343cb6cc40002d92bc382ab92b98d5b8f3e/model/modeling_minicpm.py#L1326 But I think this only parses text, so idk.

Feb 23 '24 19:02 lastrosade

@lastrosade sorry for the late response, but the current blocking point is that the gguf model does not have template at all, so it's impossible for server to detect if it should use MiniCPM template or not.

Please join the discussion in the linked issue above

Mar 07 '24 13:03 ngxson

This issue was closed because it has been inactive for 14 days since being marked as stale.

Apr 21 '24 01:04 github-actions[bot]

llama.cpp llama.cpp copied to clipboard

Get chat_template from a server endpoint.

Feature Description

Motivation

llama.cpp
llama.cpp copied to clipboard